Page MenuHomePhabricator

wenlei (Wenlei He)
User

Projects

User does not belong to any projects.

User Details

User Since
Apr 3 2019, 5:55 PM (51 w, 3 d)

Recent Activity

Fri, Mar 20

wenlei added inline comments to D76255: [SampleFDO] Port MD5 name table support to extbinary format..
Fri, Mar 20, 3:12 PM

Thu, Mar 19

wenlei accepted D76248: Fix a bug in the inliner that causes subsequent double inlining.

Thanks for fixing this! I forgot to ask earlier, but this could happen with inlining under new PM as well, right? With new PM, we collect calls within SCC, then iterate over them for inlining, so for the same reason we could have dangling call in VMap from cloning?

Yes, it could, though that didn't happen in our case. I think it's due to different orders calls are processed in the legacy and new inliner.

Thu, Mar 19, 12:35 PM · Restricted Project
wenlei added reviewers for D76248: Fix a bug in the inliner that causes subsequent double inlining: davidxl, tejohnson.
Thu, Mar 19, 9:11 AM · Restricted Project
wenlei added a comment to D76248: Fix a bug in the inliner that causes subsequent double inlining.

Thanks for fixing this! I forgot to ask earlier, but this could happen with inlining under new PM as well, right? With new PM, we collect calls within SCC, then iterate over them for inlining, so for the same reason we could have dangling call in VMap from cloning?

Thu, Mar 19, 9:11 AM · Restricted Project
wenlei added a comment to D76255: [SampleFDO] Port MD5 name table support to extbinary format..

Great to MD5 support added to ext-bin, thanks @wmi. After this change it seems compact-bin is completely superseded by ext-bin, do you plan to deprecate/remove it?

Thu, Mar 19, 8:37 AM

Mon, Mar 9

wenlei accepted D75664: [Coroutines] Also check lifetime intrinsic for local variable when build coroutine frame.

Thanks for clarification, LGTM!

Mon, Mar 9, 9:08 AM · Restricted Project

Thu, Mar 5

wenlei added a comment to D75664: [Coroutines] Also check lifetime intrinsic for local variable when build coroutine frame.

Nice change, thanks @junparser! The improved dataflow analysis using lifetime instead of allocation point and the resulting shrink-wrapping style opt for alloca would definitely be beneficial.

Thu, Mar 5, 8:45 AM · Restricted Project

Tue, Mar 3

wenlei added a comment to D74176: [CMake] Link against ZLIB::ZLIB.

This change broke our use of LLVMSupport library, and I also noticed it's been reverted already.

Tue, Mar 3, 3:33 PM · Restricted Project
wenlei added a comment to D72490: Fix cmake for zlib.

I'm not sure I understand the motivation for this change. Can you please expand on that? How are you using it that it doesn't work for you?

Tue, Mar 3, 3:33 PM · Restricted Project

Fri, Feb 28

wenlei added a comment to D75345: [Coroutines][new pass manager] Move CoroElide pass to right position.

Thanks for the patch, @junparser! Curious how did you notice this - was it causing performance regression for your workload?

Fri, Feb 28, 2:18 PM · Restricted Project

Feb 25 2020

wenlei abandoned D75101: [OpenMP] Fix insertion point for deduplicated runtime call to honor def-use order.
Feb 25 2020, 8:26 AM · Restricted Project
wenlei added a comment to D75101: [OpenMP] Fix insertion point for deduplicated runtime call to honor def-use order.

@jdoerfert good to know it's being fixed - thanks! I saw that fix is accepted, do you plan to land it now, asking because this broke dozens of internal builds, so we want to resolve it asap - deciding whether to wait for your fix to go in or we need to port over your fix manually.

Feb 25 2020, 8:17 AM · Restricted Project

Feb 24 2020

wenlei created D75101: [OpenMP] Fix insertion point for deduplicated runtime call to honor def-use order.
Feb 24 2020, 11:38 PM · Restricted Project
wenlei added inline comments to D69930: [OpenMP] Introduce the OpenMPOpt transformation pass.
Feb 24 2020, 9:32 PM · Restricted Project

Feb 23 2020

wenlei added a comment to D74814: IR printing for single function with the new pass manager..

Committed on behalf of @hoyFB per request.

Feb 23 2020, 3:34 PM · Restricted Project, Restricted Project
wenlei committed rGbae33a7c5a1f: IR printing for single function with the new pass manager. (authored by hoyFB).
IR printing for single function with the new pass manager.
Feb 23 2020, 3:30 PM
wenlei closed D74814: IR printing for single function with the new pass manager..
Feb 23 2020, 3:29 PM · Restricted Project, Restricted Project

Feb 19 2020

wenlei accepted D74814: IR printing for single function with the new pass manager..

Thanks for the making the changes, it would be nice to have some consistency (the same structure) between test cases for legacy PM and new PM, e.g. EMPTY is only tested for legacy PM, but not new; multi-function filter also only tested for legacy PM, would be nice to do them for both, and unify the check-prefix, otherwise LGTM.

Feb 19 2020, 11:18 AM · Restricted Project, Restricted Project
wenlei added a comment to D74814: IR printing for single function with the new pass manager..

Sounds good. I was trying to figure out how to invoke the new pass manager with OPT. Is there a command line switch to do that?

Feb 19 2020, 10:28 AM · Restricted Project, Restricted Project
wenlei added a comment to D74814: IR printing for single function with the new pass manager..

Thanks for fixing this. Could you use .ll IR file as test case and place it alongside with test/Other/module-pass-printer.ll or just reuse that one? We can also use opt to invoke new pass manager, so it doesn't need to be a clang test.

Feb 19 2020, 12:04 AM · Restricted Project, Restricted Project

Feb 18 2020

wenlei committed rG05c3907b88a0: Fix test for profile remapper (authored by wenlei).
Fix test for profile remapper
Feb 18 2020, 6:03 PM
wenlei closed D74799: Fix test for profile remapper.
Feb 18 2020, 6:03 PM · Restricted Project
wenlei added reviewers for D74799: Fix test for profile remapper: wmi, hoyFB.
Feb 18 2020, 3:11 PM · Restricted Project
wenlei created D74799: Fix test for profile remapper.
Feb 18 2020, 3:06 PM · Restricted Project

Feb 17 2020

wenlei accepted D71903: [Coroutines][6/6] Clang schedules new passes.
Feb 17 2020, 8:42 PM · Restricted Project
wenlei accepted D71902: [Coroutines][5/6] Add coroutine passes to pipeline.

LGTM. Thanks for fixing the coro cleanup issue with ThinLTO here.

Feb 17 2020, 8:38 PM · Restricted Project
wenlei accepted D71901: [Coroutines][4/6] New pass manager: coro-cleanup.
Feb 17 2020, 8:27 PM · Restricted Project
wenlei accepted D71900: [Coroutines][3/6] New pass manager: coro-elide.

Comment needs to be updated, otherwise LGTM.

Feb 17 2020, 8:24 PM · Restricted Project
wenlei accepted D71899: [Coroutines][2/6] New pass manager: coro-split.

Sorry for the delay, LGTM except the file comment as Johannes pointed out. We've tested this stack of coroutine patches with multiple large services internally, all working as expected, so I'm going to accept this stack as review feedbacks are addressed too.

Feb 17 2020, 8:17 PM · Restricted Project

Feb 6 2020

wenlei accepted D71898: [Coroutines][1/6] New pass manager: coro-early.

LGTM. We've tested this stack of coroutine patches with multiple large services internally at facebook, all working as expected.

Feb 6 2020, 11:18 PM · Restricted Project

Jan 20 2020

wenlei accepted D73060: [Clang] Fix expansion of response files in -Wp after integrated-cc1 change.

Is that the desired behavior? This was the previous behavior but I'm not sure it's the right behavior.

Jan 20 2020, 11:40 AM · Restricted Project

Jan 18 2020

wenlei added a comment to D69825: [Clang][Driver] Re-use the calling process instead of creating a new process for the cc1 invocation.

@aganea, @hans This patch broke response file expansion for preprocessor via -Wp - all our internal builds failed with this patch because we invoke clang with response file passed to preprocessor, e.g. clang++ ... -Wp,@pp.rsp @cc.rsp. Looks like we can't just call ExecuteCC1Tool directly, because response file expansion happens in main, so with this patch, -Wp,@pp.rsp won't be expanded properly.

Jan 18 2020, 11:48 PM · Restricted Project, Restricted Project

Jan 9 2020

wenlei added a comment to D72490: Fix cmake for zlib.

Is there a way to get the basename from ZLIB_LIBRARY by any chance? Not saying it's actually necessary, just wondering.

Jan 9 2020, 10:37 PM · Restricted Project
wenlei added a reviewer for D72490: Fix cmake for zlib: compnerd.
Jan 9 2020, 6:53 PM · Restricted Project
wenlei created D72490: Fix cmake for zlib.
Jan 9 2020, 6:53 PM · Restricted Project

Jan 7 2020

wenlei added inline comments to D71899: [Coroutines][2/6] New pass manager: coro-split.
Jan 7 2020, 4:49 PM · Restricted Project

Jan 5 2020

wenlei added inline comments to D71899: [Coroutines][2/6] New pass manager: coro-split.
Jan 5 2020, 4:07 PM · Restricted Project
wenlei added a comment to D71899: [Coroutines][2/6] New pass manager: coro-split.

Thanks for making the changes to schedule 2nd coro-split so funclets get fully optimized. A few comments inline, looks good otherwise.

Jan 5 2020, 3:28 PM · Restricted Project
wenlei added inline comments to D71901: [Coroutines][4/6] New pass manager: coro-cleanup.
Jan 5 2020, 2:51 PM · Restricted Project
wenlei added a comment to D72226: Add LazyCallGraph API to add function to RefSCC.

The short-circuit change makes sense to me. But what caused the creation of extra SCCs due to self-recursion? Is that because of the short-cut from line 808-825 and the clearing of SCC stack and DFS stack broke the LowLink and DFSNumber check of later Tarjan DFS?

Jan 5 2020, 12:54 PM · Restricted Project

Dec 30 2019

wenlei added a comment to D71899: [Coroutines][2/6] New pass manager: coro-split.

The devirt trigger here is to restart CGSCC pipeline to run coro-split again to split the coroutine function. There is no need to introduce devirt trigger in NewPM since we call coro-split pass manually.

Dec 30 2019, 8:41 PM · Restricted Project
wenlei added inline comments to D71903: [Coroutines][6/6] Clang schedules new passes.
Dec 30 2019, 12:42 PM · Restricted Project
wenlei added a comment to D71899: [Coroutines][2/6] New pass manager: coro-split.

My understanding is devirt trigger is only a trick used with Legacy PM to run optimizations on coroutine funclets after coro-split. With NewPM's CGSCCUpdateResult infra, can we communicate that change and request passes directly with ModuleToPostOrderCGSCCPassAdaptor, without artificially introducing the indirect call and devirt?

Dec 30 2019, 12:36 PM · Restricted Project

Dec 11 2019

wenlei committed rGd275a0648717: [AutoFDO] Statistic for context sensitive profile guided inlining (authored by wenlei).
[AutoFDO] Statistic for context sensitive profile guided inlining
Dec 11 2019, 9:50 PM
wenlei closed D70584: [AutoFDO] Statistic for context sensitive profile guided inlining.
Dec 11 2019, 9:50 PM · Restricted Project
wenlei added inline comments to D70584: [AutoFDO] Statistic for context sensitive profile guided inlining.
Dec 11 2019, 9:41 PM · Restricted Project
wenlei updated the diff for D70584: [AutoFDO] Statistic for context sensitive profile guided inlining.

address feedback

Dec 11 2019, 9:41 PM · Restricted Project
wenlei updated the diff for D70584: [AutoFDO] Statistic for context sensitive profile guided inlining.

rebase

Dec 11 2019, 10:04 AM · Restricted Project

Dec 7 2019

wenlei updated the diff for D70584: [AutoFDO] Statistic for context sensitive profile guided inlining.

update test case

Dec 7 2019, 9:27 AM · Restricted Project
wenlei updated the diff for D70584: [AutoFDO] Statistic for context sensitive profile guided inlining.

Restructured remarks sample profile loader inlining:

  1. Split into InlineAttempt, InlineSuccess and InlineFail for remark names.
  2. Use OptimizationRemark only for InlineSuccess and OptimizationRemarkAnalysis for the rest.
Dec 7 2019, 9:16 AM · Restricted Project

Dec 6 2019

wenlei committed rG7b61ae68ecd7: [AutoFDO] Inline replay for cold/small callees from sample profile loader (authored by wenlei).
[AutoFDO] Inline replay for cold/small callees from sample profile loader
Dec 6 2019, 11:48 AM
wenlei closed D70750: [AutoFDO] Inline replay for cold/small callees from sample profile loader.
Dec 6 2019, 11:47 AM · Restricted Project

Dec 5 2019

wenlei committed rG532196d811ad: [AutoFDO] Top-down Inlining for specialization with context-sensitive profile (authored by wenlei).
[AutoFDO] Top-down Inlining for specialization with context-sensitive profile
Dec 5 2019, 4:23 PM
wenlei closed D70655: [AutoFDO] Top-down Inlining for specialization with context-sensitive profile.
Dec 5 2019, 4:23 PM · Restricted Project
wenlei committed rGe503fd85d3ac: [AutoFDO] Properly merge context-sensitive profile of inlinee back to outlined… (authored by wenlei).
[AutoFDO] Properly merge context-sensitive profile of inlinee back to outlined…
Dec 5 2019, 4:05 PM
wenlei closed D70653: [AutoFDO] Properly merge context-sensitive profile of inlinee back to outlined function.
Dec 5 2019, 4:05 PM · Restricted Project
wenlei added a comment to D70750: [AutoFDO] Inline replay for cold/small callees from sample profile loader.

Did performance test and I saw 0.4% improvement in an internal benchmark. That is a good improvement, thanks for the change!

Dec 5 2019, 2:39 PM · Restricted Project
wenlei updated the diff for D70750: [AutoFDO] Inline replay for cold/small callees from sample profile loader.

address feedback

Dec 5 2019, 1:31 PM · Restricted Project
wenlei updated the diff for D70653: [AutoFDO] Properly merge context-sensitive profile of inlinee back to outlined function.

address feedback

Dec 5 2019, 10:52 AM · Restricted Project
wenlei added inline comments to D70653: [AutoFDO] Properly merge context-sensitive profile of inlinee back to outlined function.
Dec 5 2019, 10:52 AM · Restricted Project

Dec 3 2019

wenlei updated the diff for D70653: [AutoFDO] Properly merge context-sensitive profile of inlinee back to outlined function.

address review feedback - only do fuzzy match for call site targets for indirect calls, as a result, I had to fix a few test cases.

Dec 3 2019, 4:56 PM · Restricted Project
wenlei added inline comments to D70653: [AutoFDO] Properly merge context-sensitive profile of inlinee back to outlined function.
Dec 3 2019, 4:46 PM · Restricted Project

Nov 27 2019

wenlei added a comment to D70653: [AutoFDO] Properly merge context-sensitive profile of inlinee back to outlined function.
In D70653#1761978, @wmi wrote:

I did performance test combing D70653 and D70655, there was ~0.2% latency improvement in a benchmark, so that is good.

Nov 27 2019, 10:18 AM · Restricted Project

Nov 26 2019

wenlei updated the diff for D70750: [AutoFDO] Inline replay for cold/small callees from sample profile loader.

update test case

Nov 26 2019, 4:24 PM · Restricted Project
wenlei created D70750: [AutoFDO] Inline replay for cold/small callees from sample profile loader.
Nov 26 2019, 4:05 PM · Restricted Project

Nov 25 2019

wenlei added inline comments to D70655: [AutoFDO] Top-down Inlining for specialization with context-sensitive profile.
Nov 25 2019, 4:19 PM · Restricted Project
wenlei updated the diff for D70655: [AutoFDO] Top-down Inlining for specialization with context-sensitive profile.

address review feedback

Nov 25 2019, 4:10 PM · Restricted Project
wenlei added a comment to D70584: [AutoFDO] Statistic for context sensitive profile guided inlining.

I would imagine among all the functions with inline instance in profile, only those small and warm/cold functions which are not inlined in early inliner will be inlined in regular inliner.

Nov 25 2019, 3:14 PM · Restricted Project
wenlei added a comment to D70655: [AutoFDO] Top-down Inlining for specialization with context-sensitive profile.

Thanks for the discussions and code review. I'll address the comments later, but wanted to answer this one first.

Nov 25 2019, 2:28 PM · Restricted Project
wenlei added a comment to D70653: [AutoFDO] Properly merge context-sensitive profile of inlinee back to outlined function.

The only concern is for inline instance getEntrySamples is not precise and sometimes can have a large difference with the actual head count

Nov 25 2019, 1:42 PM · Restricted Project
wenlei added a comment to D70655: [AutoFDO] Top-down Inlining for specialization with context-sensitive profile.

One way to handle it is 1) delay early inlining of sites into a hot function if a big percentage of calls to the function are from other modules. This still allows intra module top down inlining of it; or 2) keep a clone of the unlined body of the original function and use that one during cross module inlining.

Nov 25 2019, 11:23 AM · Restricted Project
wenlei added a comment to D70655: [AutoFDO] Top-down Inlining for specialization with context-sensitive profile.

Can this be handled for cross module (thinLTO) case somehow too?

Nov 25 2019, 10:34 AM · Restricted Project
wenlei added a comment to D52845: Update entry count for cold calls.

Wenlei, this sounds like a good idea. Patches are welcome!

Nov 25 2019, 12:42 AM · Restricted Project
wenlei created D70655: [AutoFDO] Top-down Inlining for specialization with context-sensitive profile.
Nov 25 2019, 12:06 AM · Restricted Project
wenlei added a child revision for D70653: [AutoFDO] Properly merge context-sensitive profile of inlinee back to outlined function: D70655: [AutoFDO] Top-down Inlining for specialization with context-sensitive profile.
Nov 25 2019, 12:06 AM · Restricted Project

Nov 24 2019

wenlei created D70653: [AutoFDO] Properly merge context-sensitive profile of inlinee back to outlined function.
Nov 24 2019, 11:37 PM · Restricted Project

Nov 22 2019

wenlei created D70584: [AutoFDO] Statistic for context sensitive profile guided inlining.
Nov 22 2019, 12:05 AM · Restricted Project

Nov 6 2019

wenlei committed rGba1dfae054b4: Keep import function list for inlinee profile update (authored by wenlei).
Keep import function list for inlinee profile update
Nov 6 2019, 6:43 PM
wenlei closed D69736: Keep import function list for inlinee profile update.
Nov 6 2019, 6:43 PM · Restricted Project

Nov 5 2019

wenlei added a comment to D69736: Keep import function list for inlinee profile update.
In D69736#1734892, @wmi wrote:

I saw 0.3% improvement in our server benchmark. That is a great improvement!

Nov 5 2019, 6:11 PM · Restricted Project

Nov 4 2019

wenlei updated the diff for D69736: Keep import function list for inlinee profile update.

Split UpdateProfile part, address other feedbacks.

Nov 4 2019, 8:09 PM · Restricted Project
wenlei added inline comments to D69736: Keep import function list for inlinee profile update.
Nov 4 2019, 8:00 PM · Restricted Project
wenlei added inline comments to D69732: [WIP][LTO] Apply SamplePGO pipeline tunes for ThinLTO pre-link to full LTO.
Nov 4 2019, 1:49 PM · Restricted Project, Restricted Project
wenlei updated the diff for D69736: Keep import function list for inlinee profile update.

update the right version of setEntryCount

Nov 4 2019, 9:32 AM · Restricted Project
wenlei updated the diff for D69736: Keep import function list for inlinee profile update.

address review feedback

Nov 4 2019, 9:03 AM · Restricted Project
wenlei added a comment to D69736: Keep import function list for inlinee profile update.

I agree setEntryCount always copying the existing import list is better than adding updateEntryCount. Adding a separate metadata also sounds good to me. I don't know the history why enbedding import list in function_entry_count. @tejohnson may know.

Nov 4 2019, 8:53 AM · Restricted Project

Nov 1 2019

wenlei added a comment to D69732: [WIP][LTO] Apply SamplePGO pipeline tunes for ThinLTO pre-link to full LTO.

The comments indicate that this is in part due to issues with
the new PM loop pass manager

Nov 1 2019, 2:24 PM · Restricted Project, Restricted Project
wenlei added a comment to D69736: Keep import function list for inlinee profile update.

This is a minimal point fix (was trying to keep internal patch minimum), but it might actually make sense to have setEntryCount always copy over the existing import list (if it exists), instead of adding updateEntryCount? Or even further, we could have a separate metadata for the import list, instead of piggyback on function_entry_count? To me, it seems a bit weird to embed import list as part of function_entry_count - wondering what's the reason behind that?

Nov 1 2019, 1:35 PM · Restricted Project
wenlei created D69736: Keep import function list for inlinee profile update.
Nov 1 2019, 1:24 PM · Restricted Project

Oct 16 2019

wenlei accepted D68901: [SampleFDO] Add profile remapping support for profile on-demand loading used by ExtBinary format profile.

Thanks for making the changes, LGTM!

Oct 16 2019, 7:04 PM · Restricted Project
wenlei added inline comments to D68901: [SampleFDO] Add profile remapping support for profile on-demand loading used by ExtBinary format profile.
Oct 16 2019, 10:46 AM · Restricted Project

Oct 14 2019

wenlei added inline comments to D68901: [SampleFDO] Add profile remapping support for profile on-demand loading used by ExtBinary format profile.
Oct 14 2019, 11:30 AM · Restricted Project

Oct 12 2019

Herald added a project to D52845: Update entry count for cold calls: Restricted Project.

@wmi @davidxl FYI, we have an internal patch to address this issue further. This patch only scales up entry count of outline function for cold callsites, thus losing context-sensitiveness. But ideally, we could reuse the entire profile from that inline context, and merge it all back to the outlined function. We changed the function annotation/inlining order to be top-down, then as we decide to not inline a cold call site, we merge the entire FunctionSamples of that inlinee back to outline counterpart, and since we process functions in top-down order (ignoring recursion for a second), annotation of the outline callee is able to use post merge FunctionSamples which is more accurate than simple scaling of entry count. (Doing context-sensitive inlining top-down has other benefits too, as it allows specialization whiling inlining which helps maximize the benefit of having context sensitive profile)

Oct 12 2019, 11:13 AM · Restricted Project

Oct 8 2019

wenlei accepted D68601: [SampleFDO] Add indexing for function profiles so they can be loaded on demand in ExtBinary format.

Symbol list can be provided to llvm-profdata in a plain text file.

Oct 8 2019, 3:45 PM · Restricted Project

Oct 7 2019

wenlei added a comment to D68601: [SampleFDO] Add indexing for function profiles so they can be loaded on demand in ExtBinary format.

Thanks for making indexing available for extended binary format. We've recently adding indexing to binary format as well in an internal patch, and also observed similar (~30%) build time reduction for some large services.

Oct 7 2019, 11:21 PM · Restricted Project
wenlei committed rGb3342e180e9c: [llvm-profdata] Minor format fix (authored by wenlei).
[llvm-profdata] Minor format fix
Oct 7 2019, 10:15 PM
wenlei committed rL373917: [llvm-profdata] Minor format fix.
[llvm-profdata] Minor format fix
Oct 7 2019, 10:15 PM
wenlei closed D68440: [llvm-profdata] Minor format fix.
Oct 7 2019, 10:15 PM · Restricted Project

Oct 3 2019

wenlei added a comment to D68440: [llvm-profdata] Minor format fix.

Before:

...
  13.1: inlined callee: _ZN6StringD2Ev: 0, 0, 0 sampled lines
    No samples collected in the function's body
    Samples collected in inlined callsites {
      0: inlined callee: _ZN6String4freeEv: 0, 0, 6 sampled lines
        Samples collected in the function's body {
          2: 0
          4: 0
          5: 0
          6: 0
          7: 0
          8: 0
        }
        Samples collected in inlined callsites {
          6: inlined callee: my_free: 0, 0, 1 sampled lines
            Samples collected in the function's body {
              4: 0
            }
            No inlined callsites in this function
}
}
}

After:

...
  13.1: inlined callee: _ZN6StringD2Ev: 0, 0, 0 sampled lines
    No samples collected in the function's body
    Samples collected in inlined callsites {
      0: inlined callee: _ZN6String4freeEv: 0, 0, 6 sampled lines
        Samples collected in the function's body {
          2: 0
          4: 0
          5: 0
          6: 0
          7: 0
          8: 0
        }
        Samples collected in inlined callsites {
          6: inlined callee: my_free: 0, 0, 1 sampled lines
            Samples collected in the function's body {
              4: 0
            }
            No inlined callsites in this function
        }
    }
}
Oct 3 2019, 9:31 PM · Restricted Project
wenlei created D68440: [llvm-profdata] Minor format fix.
Oct 3 2019, 7:03 PM · Restricted Project