Page MenuHomePhabricator

pengfei (Pengfei Wang)
User

Projects

User does not belong to any projects.

User Details

User Since
Dec 12 2018, 5:57 PM (110 w, 6 d)

Recent Activity

Today

pengfei updated the diff for D95136: [X86] Fix tile config register spill issue..

Change enum names to make them more accurate.

Wed, Jan 27, 6:42 AM · Restricted Project
pengfei updated the diff for D95136: [X86] Fix tile config register spill issue..

A minor refactor.

Wed, Jan 27, 4:24 AM · Restricted Project
pengfei added inline comments to D95508: [X86] Fix tile config register spill issue for AMX.
Wed, Jan 27, 12:17 AM · Restricted Project

Yesterday

pengfei updated the diff for D95136: [X86] Fix tile config register spill issue..

Reduce BB status to 3 cases, simplify code based on it.

Tue, Jan 26, 11:16 PM · Restricted Project
pengfei accepted D94466: [X86] merge "={eax}" and "~{eax}" into "=&eax" for MSInlineASM.

LGTM.

Tue, Jan 26, 6:25 PM · Restricted Project
pengfei accepted D94614: [FPEnv][X86] Platform builtins edition: clang should get from the AST the metadata for constrained FP builtins.

LGTM. Thanks

Tue, Jan 26, 4:53 AM · Restricted Project
pengfei added inline comments to D94466: [X86] merge "={eax}" and "~{eax}" into "=&eax" for MSInlineASM.
Tue, Jan 26, 2:15 AM · Restricted Project

Mon, Jan 25

pengfei accepted D95421: [NFC] Refine some uninitialized used variables..

LGTM.

Mon, Jan 25, 11:13 PM · Restricted Project, Restricted Project
pengfei updated the diff for D95136: [X86] Fix tile config register spill issue..

Fix a bug when calculating indirect successors.

Mon, Jan 25, 8:53 AM · Restricted Project

Sun, Jan 24

pengfei updated the diff for D95136: [X86] Fix tile config register spill issue..

Small refactor: Change recursive DFS to iterative DFS when insert tilerelease.

Sun, Jan 24, 6:14 PM · Restricted Project
pengfei updated the diff for D95136: [X86] Fix tile config register spill issue..

Fix a missing case in function updateBBStatus.

Sun, Jan 24, 7:09 AM · Restricted Project
pengfei added inline comments to D95136: [X86] Fix tile config register spill issue..
Sun, Jan 24, 5:03 AM · Restricted Project
pengfei updated the diff for D95136: [X86] Fix tile config register spill issue..

Address Yuanke's comment. Moving the insertion to check AMX block.

Sun, Jan 24, 5:00 AM · Restricted Project
pengfei updated the diff for D95136: [X86] Fix tile config register spill issue..

Adress Yuanke's comments.

Sun, Jan 24, 3:31 AM · Restricted Project

Sat, Jan 23

pengfei added inline comments to D95136: [X86] Fix tile config register spill issue..
Sat, Jan 23, 6:48 PM · Restricted Project
pengfei updated the diff for D95136: [X86] Fix tile config register spill issue..

A minor change.

Sat, Jan 23, 8:11 AM · Restricted Project
pengfei updated the diff for D95136: [X86] Fix tile config register spill issue..

This patch reduces algorithm complexity of function findNeedInsertBB, while increases a bit code complexity compared with previous one.

Sat, Jan 23, 7:51 AM · Restricted Project
pengfei updated the diff for D95136: [X86] Fix tile config register spill issue..

A minor refactor.

Sat, Jan 23, 5:28 AM · Restricted Project
pengfei updated the diff for D95136: [X86] Fix tile config register spill issue..

Remove an unnecessary ++MII;

Sat, Jan 23, 5:06 AM · Restricted Project
pengfei added inline comments to D95136: [X86] Fix tile config register spill issue..
Sat, Jan 23, 4:17 AM · Restricted Project
pengfei updated the summary of D95136: [X86] Fix tile config register spill issue..
Sat, Jan 23, 4:11 AM · Restricted Project
pengfei updated the diff for D95136: [X86] Fix tile config register spill issue..
  1. Address review comments from Xiang and Yuanke.
  2. Optimize the algorithm to a single DFS.
  3. Fix bugs when counting the place for tilerelease.
Sat, Jan 23, 4:10 AM · Restricted Project

Thu, Jan 21

pengfei added inline comments to D95136: [X86] Fix tile config register spill issue..
Thu, Jan 21, 11:57 PM · Restricted Project
pengfei updated the diff for D95136: [X86] Fix tile config register spill issue..

Fix Lint warnings.

Thu, Jan 21, 4:31 PM · Restricted Project
pengfei added a comment to D94155: [X86] Fix tile config register spill issue..
Thu, Jan 21, 7:27 AM · Restricted Project
pengfei requested review of D95136: [X86] Fix tile config register spill issue..
Thu, Jan 21, 7:23 AM · Restricted Project
pengfei added inline comments to D94155: [X86] Fix tile config register spill issue..
Thu, Jan 21, 12:49 AM · Restricted Project
pengfei added inline comments to D94155: [X86] Fix tile config register spill issue..
Thu, Jan 21, 12:35 AM · Restricted Project

Wed, Jan 20

pengfei added a comment to D93129: [LV] Do not use vector type to compute cost of scalar address comp..

I don't have much experience on performance tuning. Adding more people who contributed to the affected tests.

Wed, Jan 20, 6:39 AM · Restricted Project
pengfei added reviewers for D93129: [LV] Do not use vector type to compute cost of scalar address comp.: dorit, kbarton, echristo.
Wed, Jan 20, 6:39 AM · Restricted Project
pengfei accepted D94895: [X86] Add experimental option to separately tune alignment of innermost loops.

LGTM.

Wed, Jan 20, 5:21 AM · Restricted Project

Tue, Jan 19

pengfei accepted D94155: [X86] Fix tile config register spill issue..

LGTM, but let's wait for one day or two to see others' opinions.

Tue, Jan 19, 6:40 PM · Restricted Project
pengfei added inline comments to D94895: [X86] Add experimental option to separately tune alignment of innermost loops.
Tue, Jan 19, 6:19 PM · Restricted Project
pengfei added inline comments to D94155: [X86] Fix tile config register spill issue..
Tue, Jan 19, 1:17 AM · Restricted Project

Mon, Jan 18

pengfei accepted D94943: [X86][AMX] Fix the typo..

LGTM.

Mon, Jan 18, 10:19 PM · Restricted Project
pengfei accepted D94910: [X86][AMX] Clear AMX lit test case..

LGTM.

Mon, Jan 18, 7:05 PM · Restricted Project
pengfei added inline comments to D94910: [X86][AMX] Clear AMX lit test case..
Mon, Jan 18, 6:31 PM · Restricted Project
pengfei added inline comments to D94910: [X86][AMX] Clear AMX lit test case..
Mon, Jan 18, 6:01 AM · Restricted Project
pengfei accepted D94772: [X86] Fix tile spill merge issue..

LGTM.

Mon, Jan 18, 4:38 AM · Restricted Project

Sun, Jan 17

pengfei added inline comments to D94772: [X86] Fix tile spill merge issue..
Sun, Jan 17, 5:34 PM · Restricted Project
pengfei added inline comments to D94772: [X86] Fix tile spill merge issue..
Sun, Jan 17, 5:21 PM · Restricted Project
pengfei added inline comments to D94614: [FPEnv][X86] Platform builtins edition: clang should get from the AST the metadata for constrained FP builtins.
Sun, Jan 17, 5:02 PM · Restricted Project

Fri, Jan 15

pengfei updated subscribers of D94726: [X86] Add segment and address-size override prefixes.
Fri, Jan 15, 9:12 PM · Restricted Project
pengfei updated subscribers of D82862: [ThinLTO] Always parse module level inline asm with At&t dialect.
Fri, Jan 15, 8:35 PM · Restricted Project, Restricted Project
pengfei added a comment to D94614: [FPEnv][X86] Platform builtins edition: clang should get from the AST the metadata for constrained FP builtins.

This doesn't add metadata to llvm intrinsics that are not constrained.

Oh, right. I misunderstood what's doing in these patches and thought we can add metadata to any intrinsics by CGFPOptionsRAII now. :-)

If a relevant #pragma is used then without this change the metadata ...

Yeah, I understand it now. Thank you! But why we still have the wrong maytrap with this patch?

It's true that the middle-end optimizers know nothing about the constrained intrinsics. Today. I plan on changing that in the near future.

Brilliant!

If use of the constrained intrinsics will cause breakage of the target-specific clang builtins then that's important to know and to fix.

I had a look at these changes and didn't find anything will cause breakage for now. I'm still not sure if we need to teach target-specific clang builtins to respect strict FP or not. But it has nothing to do with this patch.

Fri, Jan 15, 8:07 AM · Restricted Project

Thu, Jan 14

pengfei added a comment to D82862: [ThinLTO] Always parse module level inline asm with At&t dialect.

What is the reason for treating this differently in LLVM?

Thu, Jan 14, 7:34 PM · Restricted Project, Restricted Project
pengfei added a comment to D94155: [X86] Fix tile config register spill issue..

tile config register is the user of each AMX instruction

Thu, Jan 14, 7:19 AM · Restricted Project

Wed, Jan 13

pengfei added a comment to D94614: [FPEnv][X86] Platform builtins edition: clang should get from the AST the metadata for constrained FP builtins.

Hi Kevin, what's the intention of adding constrained FP metadata for target dependent builtins? I believe the middle-end passes always ignore these builtins. What's more, will it imply user these builtins have different behaviors under different FP model? But it's not true for most platform builtins, since they are just representative of given instructions.

Wed, Jan 13, 6:37 PM · Restricted Project

Tue, Jan 12

pengfei added inline comments to D94466: [X86] merge "={eax}" and "~{eax}" into "=&eax" for MSInlineASM.
Tue, Jan 12, 11:27 PM · Restricted Project
pengfei added inline comments to D94155: [X86] Fix tile config register spill issue..
Tue, Jan 12, 8:10 PM · Restricted Project

Mon, Jan 11

pengfei added inline comments to D94466: [X86] merge "={eax}" and "~{eax}" into "=&eax" for MSInlineASM.
Mon, Jan 11, 9:54 PM · Restricted Project
pengfei accepted D94372: [X86][AMX] Prohibit pointer cast on load..

LGTM, but I'd like to see if @lebedev.ri has any objections.

Mon, Jan 11, 8:41 PM · Restricted Project

Sun, Jan 10

pengfei added inline comments to D94372: [X86][AMX] Prohibit pointer cast on load..
Sun, Jan 10, 4:42 AM · Restricted Project
pengfei added a comment to D94372: [X86][AMX] Prohibit pointer cast on load..

to amx intrinsics the the

Sun, Jan 10, 12:12 AM · Restricted Project

Fri, Jan 8

pengfei added a comment to D94268: Allow _mm_empty() (via llvm.x86.mmx.emms) to be a no-op without MMX..

Is inline assembly the only case emms instruction will be needed? But inline assembly doesn't enable mmx attribute automatically, right? E.g. https://godbolt.org/z/43ases

Yes, inline or external asm should be the only reason there should be any MMX register usage when all is done here. After this patch, the default is still to have mmx enabled by default with sse, despite that clang won't use it. But users can pass -mno-mmx if they like. And, yes, clang only requires -mmmx if you use the "y" asm constraint, not if you use mmx instructions inside the asm string.

I wrote this patch because making it a no-op is the same behavior GCC has. However, I'm not sure this is necessarily the right way to go. On the plus side for this patch, it allows intrinsic-using code to stop emitting spurious emms instructions, if compiled with -mno-mmx. However, the negative is that inline-asm code which _doesn't_ use the "y" constraint might still be using MMX within an asm blob, and be depending on calls to _mm_empty() outside of the asm, and such code would be silently broken when compiled with -mno-mmx.

At first, I thought that most uses of inline-asm would be using constraints, but after looking around at existing MMX asm, it seems that nearly all of it does _not_ use a "y" constraint or even clobber any fpu or mmx registers. And they do also depend on _mm_empty() in combination with their unmarked inline asm. Which...now that I think about it more, makes passing -mno-mmx to the compiler almost entirely pointless.

So, now I'm thinking I'll just drop this change, actually.

Fri, Jan 8, 6:58 PM · Restricted Project, Restricted Project

Thu, Jan 7

pengfei committed rGc102b9697bd4: [X86] Correct the comments about comparison intrinsics. NFCI. (authored by pengfei).
[X86] Correct the comments about comparison intrinsics. NFCI.
Thu, Jan 7, 11:37 PM
pengfei added a comment to D94268: Allow _mm_empty() (via llvm.x86.mmx.emms) to be a no-op without MMX..

Is inline assembly the only case emms instruction will be needed? But inline assembly doesn't enable mmx attribute automatically, right? E.g. https://godbolt.org/z/43ases
Analyzing asm block and appending the mmx attribute if we see mmx instructions might be needed. But if we do the analysis, just adding an emms instruction at the end of the block seems better.

Thu, Jan 7, 7:31 PM · Restricted Project, Restricted Project

Tue, Jan 5

pengfei added inline comments to D94155: [X86] Fix tile config register spill issue..
Tue, Jan 5, 11:43 PM · Restricted Project
pengfei accepted D94118: [X86] ESP should not be in the regcall CSR list.

LGTM.

Tue, Jan 5, 7:49 PM · Restricted Project
pengfei added inline comments to D94118: [X86] ESP should not be in the regcall CSR list.
Tue, Jan 5, 7:30 PM · Restricted Project
pengfei added inline comments to D94134: [X86] Add TLBSYNC, INVLPGB and SNP instructions.
Tue, Jan 5, 6:44 PM · lld, Restricted Project
pengfei added a comment to D94118: [X86] ESP should not be in the regcall CSR list.

I saw the document mentions ESP/RSP should be preserve for regcall. But I cannot think out a situation that ESP/RSP need to preserve either.

Tue, Jan 5, 6:34 PM · Restricted Project

Mon, Jan 4

pengfei added inline comments to D93597: [X86][SSE] Enable constexpr on some basic SSE intrinsics (RFC).
Mon, Jan 4, 6:22 AM · Restricted Project

Wed, Dec 30

pengfei accepted D92837: [X86] Support tilezero intrinsic and c interface for AMX..

LGTM.

Wed, Dec 30, 7:44 PM · Restricted Project, Restricted Project
pengfei accepted D93931: [X86] Don't fold negative offset into 32-bit absolute address (e.g. movl $foo-1, %eax).

LGTM.

Wed, Dec 30, 5:17 PM · Restricted Project
pengfei added a comment to D93860: [SLP] delete unused pairwise reduction option.

Let me know if I am not answering/understanding the questions.

Wed, Dec 30, 5:16 PM · Restricted Project
pengfei added a comment to D93860: [SLP] delete unused pairwise reduction option.

Hi Sanjay, pairwise reduction seems not adopted by any targets. But do we need to consider the FMF here? I found LangRef says the intrinsic is sequential for fadd and fmul if reassoc flag is not set.

Hi @pengfei -
Sorry for the delayed reply - I did not get an email notification for your comment here in Phab.

We do not need to care about FMF directly here because neither of these shuffle variations corresponds to "sequential" reduction of the elements. In other words, we should require reassoc (at the least) to form an fadd or fmul reduction that is expected to need any shuffles. Let me know if you see any holes in that theory.

One motivation for this cleanup is to correct bugs with using/propagating FMF (that was also true for D87416). For example - https://llvm.org/PR35538

Wed, Dec 30, 7:30 AM · Restricted Project
pengfei added inline comments to D91927: [X86] Add x86_amx type for intel AMX..
Wed, Dec 30, 6:35 AM · Restricted Project, Restricted Project
pengfei committed rG16c2067cf212: [X86][AMX] Fix compilation warning introduced by 981a0bd8. (authored by pengfei).
[X86][AMX] Fix compilation warning introduced by 981a0bd8.
Wed, Dec 30, 6:34 AM

Tue, Dec 29

pengfei added inline comments to D93931: [X86] Don't fold negative offset into 32-bit absolute address (e.g. movl $foo-1, %eax).
Tue, Dec 29, 11:46 PM · Restricted Project
pengfei added a comment to D93179: [X86] Convert fmin/fmax _mm_reduce_* intrinsics to emit llvm.reduction intrinsics (PR47506).

Hi Simon, I found we have the same problem for fadd/fmul. See https://godbolt.org/z/3YKaGx
X86 Intrinsics imply reassoc flag, but llvm.vector.reduce.* doesn't.

I'm not surprised - my current plan (after the holidays) is to add doxygen descriptions for all the reduction intrinsics and then update them making it clear what fast-math flags are assumed.

Tue, Dec 29, 4:01 AM · Restricted Project

Mon, Dec 28

pengfei added a comment to D93860: [SLP] delete unused pairwise reduction option.

Hi Sanjay, pairwise reduction seems not adopted by any targets. But do we need to consider the FMF here? I found LangRef says the intrinsic is sequential for fadd and fmul if reassoc flag is not set.

Mon, Dec 28, 6:03 PM · Restricted Project
pengfei added a comment to D93179: [X86] Convert fmin/fmax _mm_reduce_* intrinsics to emit llvm.reduction intrinsics (PR47506).

Hi Simon, I found we have the same problem for fadd/fmul. See https://godbolt.org/z/3YKaGx
X86 Intrinsics imply reassoc flag, but llvm.vector.reduce.* doesn't.

Mon, Dec 28, 5:50 PM · Restricted Project

Dec 24 2020

pengfei accepted D91927: [X86] Add x86_amx type for intel AMX..

LGTM. Thanks for the refactors. Maybe better to wait for a few days to see if others have objections.

Dec 24 2020, 12:34 AM · Restricted Project, Restricted Project

Dec 23 2020

pengfei added inline comments to D91927: [X86] Add x86_amx type for intel AMX..
Dec 23 2020, 11:35 PM · Restricted Project, Restricted Project
pengfei accepted D93792: [X86] Refactor AMX test case, remove unnecessary code..

LGTM.

Dec 23 2020, 10:16 PM · Restricted Project
pengfei added a comment to D91927: [X86] Add x86_amx type for intel AMX..

In my test case, it is transformed after Combine redundant instructions.

Can we disable it for AMX type? The pointer to AMX type is meaningless and may result in bad perfomance.

Ok, I'll disable the transform for AMX type.

Dec 23 2020, 7:57 PM · Restricted Project, Restricted Project
pengfei added a comment to D91927: [X86] Add x86_amx type for intel AMX..

In my test case, it is transformed after Combine redundant instructions.

Dec 23 2020, 6:15 AM · Restricted Project, Restricted Project

Dec 22 2020

pengfei added inline comments to D91927: [X86] Add x86_amx type for intel AMX..
Dec 22 2020, 6:57 AM · Restricted Project, Restricted Project

Dec 21 2020

pengfei added inline comments to D91927: [X86] Add x86_amx type for intel AMX..
Dec 21 2020, 5:46 AM · Restricted Project, Restricted Project

Dec 19 2020

pengfei accepted D93524: [X86] Teach assembler to accept vmsave/vmload/vmrun/invlpga/skinit with or without the fixed register operands.

LGTM.

Dec 19 2020, 4:49 AM · Restricted Project

Dec 17 2020

pengfei added inline comments to D93506: [NFC][utils] Factor remaining APIs under FunctionTestBuilder.
Dec 17 2020, 10:21 PM · Restricted Project
pengfei accepted D93506: [NFC][utils] Factor remaining APIs under FunctionTestBuilder.

LGTM.

Dec 17 2020, 10:03 PM · Restricted Project

Dec 16 2020

pengfei accepted D93413: [NFC] factor update test function test builder as a class.

LGTM.

Dec 16 2020, 8:38 PM · Restricted Project
pengfei added inline comments to D93413: [NFC] factor update test function test builder as a class.
Dec 16 2020, 6:08 PM · Restricted Project

Dec 15 2020

pengfei added inline comments to D93078: [utils] Fix UpdateTestChecks case where 2 runs differ for last label.
Dec 15 2020, 9:21 PM · Restricted Project, Restricted Project

Dec 14 2020

pengfei accepted D93078: [utils] Fix UpdateTestChecks case where 2 runs differ for last label.

LGTM. Thanks.
update_test_prefix.py assumes the conflicting output. You may need to change the expection of it as well.

Dec 14 2020, 9:34 PM · Restricted Project, Restricted Project
pengfei accepted D93173: [X86] Add test case for commit e52bc1d2bba794b..

LGTM.

Dec 14 2020, 6:23 PM · Restricted Project
pengfei added a comment to D93179: [X86] Convert fmin/fmax _mm_reduce_* intrinsics to emit llvm.reduction intrinsics (PR47506).

If we're going by existing behavior/compatibility, gcc/icc use packed ops too:
https://godbolt.org/z/9jEhaW
...so there's an implicit 'nnan nsz' in these intrinsics (and that should be documented in the header file (and file a bug for Intel's page at https://software.intel.com/sites/landingpage/IntrinsicsGuide/ ?).

Dec 14 2020, 4:44 PM · Restricted Project
pengfei added a comment to D93078: [utils] Fix UpdateTestChecks case where 2 runs differ for last label.

What's the difference with the existing code? It looks to me that you just brought the warning out of loop, right?

Dec 14 2020, 5:00 AM · Restricted Project, Restricted Project
pengfei added a comment to D93179: [X86] Convert fmin/fmax _mm_reduce_* intrinsics to emit llvm.reduction intrinsics (PR47506).

Yes, it is using maxpd/minpd here. Sorry for missing the nan cases.
In fact, I doubt about the behavior when using fast math with target intrinsics. In my opinion, target intrinsics are always associated with given instructions (reduce* are exceptions). So the behavior of intrinsics, e.g. respect nans, signaling, rounding, exception etc. are concordance with their associated instructions. That said the fast math flags won't change the behavior of intrinsics.
Assume above, I'm happy to set these expansions to fast math. Either keeping the existing implementaion or expansion LGTM.

Dec 14 2020, 4:27 AM · Restricted Project
pengfei added inline comments to D93173: [X86] Add test case for commit e52bc1d2bba794b..
Dec 14 2020, 3:50 AM · Restricted Project

Dec 11 2020

pengfei accepted D92940: [X86] Convert fadd/fmul _mm_reduce_* intrinsics to emit llvm.reduction intrinsics (PR47506).

LGTM. Thanks for bringing this refactor.
I also verified that ICC and GCC both do reduce math in an binary tree way, though sometimes ICC has a different LSB from GCC and Clang.

Dec 11 2020, 7:00 AM · Restricted Project

Dec 10 2020

pengfei added inline comments to D93078: [utils] Fix UpdateTestChecks case where 2 runs differ for last label.
Dec 10 2020, 9:24 PM · Restricted Project, Restricted Project
pengfei added a comment to D92965: [NFC] Remove unused prefixes in llvm/test/CodeGen/X86.

looks like quite a few (3K) of tests in CodeGen (14K or so) use it. So I wanted to understand why removing a prefix used nowhere in the test lead to an incorrect rewrite of some assertions.

There are more then 6K tests. update_llc_test_checks.py is just one of the update scripts.

Dec 10 2020, 8:10 PM · Restricted Project
pengfei added a comment to D93078: [utils] Fix UpdateTestChecks case where 2 runs differ for last label.

Add initial author to have a review.

Dec 10 2020, 7:48 PM · Restricted Project, Restricted Project
pengfei added a reviewer for D93078: [utils] Fix UpdateTestChecks case where 2 runs differ for last label: spatel.
Dec 10 2020, 7:48 PM · Restricted Project, Restricted Project
pengfei added a comment to D92965: [NFC] Remove unused prefixes in llvm/test/CodeGen/X86.

I think you meant --allow-unused-prefixes=true.

Yes, that's what I meant. Thanks for correct me.

Dec 10 2020, 6:25 AM · Restricted Project

Dec 9 2020

pengfei added a comment to D92965: [NFC] Remove unused prefixes in llvm/test/CodeGen/X86.

I left about 12 tests when I fixed the prefixes issues using the script, because the changes on prefixes will resulted in conflict when re-generating them. I suspected if it is a bug in update_llc_test_checks.py, but I didn't find time to look through. Maybe we can add --allow-unused-prefixes=false for these failed tests as a workaround?

I actually just undid the re-generation changes, leaving just the unused prefix removal. I'm not sure what the pros/cons to re-generation would be - my reasoning was about "keeping this change scoped". The tests passed, though with a FileCheck with the flag flipped to not allow unused prefixes.

Dec 9 2020, 11:20 PM · Restricted Project
pengfei accepted D87981: [X86] AMX programming model..

LGTM. I think we can land this patch as a beginning. Cheers~

Dec 9 2020, 11:10 PM · Restricted Project, Restricted Project