wmi (Wei Mi)
User

Projects

User does not belong to any projects.

User Details

User Since
Feb 20 2015, 10:57 AM (174 w, 2 d)

Recent Activity

Fri, Jun 22

wmi added a comment to D48510: [SampleFDO] Add an option to turn on/off warning about samples unused.

Does this happen when the code the profile is being applied to is not being built with some necessary debug options like -gmlt? Can/should it be fixed by adding that or another option?

Fri, Jun 22, 7:18 PM
wmi created D48510: [SampleFDO] Add an option to turn on/off warning about samples unused.
Fri, Jun 22, 5:07 PM

Mon, Jun 11

wmi committed rL334476: [NFC] Change sample profile format enum name SPF_Raw_Binary to SPF_Binary..
[NFC] Change sample profile format enum name SPF_Raw_Binary to SPF_Binary.
Mon, Jun 11, 10:58 PM
wmi committed rL334475: Fix a typo in rL334447..
Fix a typo in rL334447.
Mon, Jun 11, 9:47 PM
wmi committed rL334455: Fix a buildbot error reported by sanitizer-x86_64-linux-fast:.
Fix a buildbot error reported by sanitizer-x86_64-linux-fast:
Mon, Jun 11, 4:43 PM
wmi committed rL334453: Fix a warning issued by clang..
Fix a warning issued by clang.
Mon, Jun 11, 4:13 PM
wmi committed rL334449: Fix a warning reported by clang but not by gcc..
Fix a warning reported by clang but not by gcc.
Mon, Jun 11, 3:55 PM
wmi committed rL334447: [SampleFDO] Add a new compact binary format for sample profile..
[SampleFDO] Add a new compact binary format for sample profile.
Mon, Jun 11, 3:45 PM
wmi closed D47955: [SampleFDO] Add a new compact binary format for sample profile.
Mon, Jun 11, 3:45 PM

Fri, Jun 8

wmi updated the diff for D47955: [SampleFDO] Add a new compact binary format for sample profile.

Update with a minor refactoring change.

Fri, Jun 8, 11:23 AM
wmi created D47955: [SampleFDO] Add a new compact binary format for sample profile.
Fri, Jun 8, 11:09 AM

May 10 2018

wmi committed rL332058: [SampleFDO] Don't treat warm callsite with inline instance in the profile as….
[SampleFDO] Don't treat warm callsite with inline instance in the profile as…
May 10 2018, 4:06 PM
wmi closed D45377: [SampleFDO] Don't let inliner treat warm callsite with inline instance in the profile as cold.
May 10 2018, 4:06 PM
wmi updated subscribers of D45377: [SampleFDO] Don't let inliner treat warm callsite with inline instance in the profile as cold.

Ping.

May 10 2018, 11:34 AM

May 3 2018

wmi added a comment to D45377: [SampleFDO] Don't let inliner treat warm callsite with inline instance in the profile as cold.

Iterative AFDO result is comparable with AFDO result.

May 3 2018, 9:41 AM

May 2 2018

wmi accepted D46357: [GCOV] Emit the writeout function as nested loops of global data..

LGTM. Thanks!

May 2 2018, 3:26 PM
wmi updated the diff for D45377: [SampleFDO] Don't let inliner treat warm callsite with inline instance in the profile as cold.

remove SampleProfileHotThreshold. The benchmarks showed no regressions. I am now testing the iterative AFDO result.

May 2 2018, 10:16 AM
wmi added a comment to D46357: [GCOV] Emit the writeout function as nested loops of global data..

The generated code looks great. We may want to do the similar thing in @__llvm_gcov_flush. It generated a bunch of memset in straight line code in a loop of CountersBySP.
call void @llvm.memset.p0i8.i64(i8* align 16 bitcast ([2 x i64]* @__llvm_gcov_ctr to i8*), i8 0, i64 16, i1 false)
call void @llvm.memset.p0i8.i64(i8* align 16 bitcast ([12 x i64]* @__llvm_gcov_ctr.6 to i8*), i8 0, i64 96, i1 false)
call void @llvm.memset.p0i8.i64(i8* align 16 bitcast ([2 x i64]* @__llvm_gcov_ctr.7 to i8*), i8 0, i64 16, i1 false)
...

May 2 2018, 9:56 AM

May 1 2018

wmi committed rL331286: Use no-op opt run to eliminate the difference in bb pred comment, per….
Use no-op opt run to eliminate the difference in bb pred comment, per…
May 1 2018, 10:23 AM
wmi committed rL331280: Fix the sed command in test which doesn't work well on BSD..
Fix the sed command in test which doesn't work well on BSD.
May 1 2018, 9:41 AM
wmi committed rL331266: Fix the issue that ComputeValueKnownInPredecessors only handles the case when.
Fix the issue that ComputeValueKnownInPredecessors only handles the case when
May 1 2018, 7:51 AM
wmi closed D46275: [JumpThreading] ComputeValueKnownInPredecessors only handles the case when phi is on lhs of a comparison op.
May 1 2018, 7:51 AM

Apr 30 2018

wmi added inline comments to D45377: [SampleFDO] Don't let inliner treat warm callsite with inline instance in the profile as cold.
Apr 30 2018, 8:51 PM
wmi updated the diff for D46275: [JumpThreading] ComputeValueKnownInPredecessors only handles the case when phi is on lhs of a comparison op.

update the patch to fix a bug found in llvm bootstrap.

Apr 30 2018, 4:22 PM
wmi updated the diff for D45377: [SampleFDO] Don't let inliner treat warm callsite with inline instance in the profile as cold.

Update the patch to use the solution of comparing total count to hot cutoff threshold.

Apr 30 2018, 4:18 PM
Herald added a reviewer for D45377: [SampleFDO] Don't let inliner treat warm callsite with inline instance in the profile as cold: javed.absar.
In D45377#1071031, @wmi wrote:

The problem of letting regular inliner to handle warm callsites is that the callee may have profile missing if it is fully inlined. Maybe instead of comparing total_count/num_calle_bb to hot
threshold, just compare total_count to hot threshold? I agree this may increase code size a little, but it should not be worst than the previous afdo binary?

Yes, that is the same concern I have in my reply to David's suggestion, but the result seems fine. I can measure your suggested way and see how it looks like.

I tested the solution of comparing total_count to hot threshold, for the two server benchmarks the performance had no change. But for the regressed benchmark, it is a little worse than the solution of comparing total_count/num_callee_bb to hot threshold -- in my three runs there were two runs for which the regression was larger than the fluctuation range the target benchmarks allows. I know it is possible there is other side-effect taking place here, but for now I don't have detail perf profile for me to find out.

Apr 30 2018, 10:51 AM
wmi created D46275: [JumpThreading] ComputeValueKnownInPredecessors only handles the case when phi is on lhs of a comparison op.
Apr 30 2018, 10:41 AM

Apr 26 2018

wmi created D46127: [RegisterCoalesing] Eliminate unnecessary live range shrinking inside of reMaterializeTrivialDef.
Apr 26 2018, 9:03 AM

Apr 18 2018

wmi added a comment to D45377: [SampleFDO] Don't let inliner treat warm callsite with inline instance in the profile as cold.

The problem of letting regular inliner to handle warm callsites is that the callee may have profile missing if it is fully inlined. Maybe instead of comparing total_count/num_calle_bb to hot
threshold, just compare total_count to hot threshold? I agree this may increase code size a little, but it should not be worst than the previous afdo binary?

Apr 18 2018, 8:45 AM

Apr 16 2018

wmi added a comment to D45377: [SampleFDO] Don't let inliner treat warm callsite with inline instance in the profile as cold.
In D45377#1068853, @wmi wrote:

Thanks for the fix!

I think maybe a preferred way to fix this is to change SampleProfileLoader::inlineHotFunctions to inline these "warm" inlined callsites early. The current algorithm uses callsiteIsHot, which compares inline instance's total count to the caller's total count, which could be misleading if the caller is super large/hot. A better algorithm should compare inline instance's total count to PSI to get a global hotness. In this way, if the profile annotator thinks a callsite is not hot, the later inliner should *not* even try to inline it. This makes the design cleaner and more stable. WDYT?

I tried the idea to compute the inline instance's total count divided by its bb count, and compare the division result to PSI hot threshold. That improved the regression benchmark but did not recover the whole regression. That is why I choosed to keep the current callsiteIsHot check in early inliner unchanged because I guessed regular inliner may have a better position to decide whether to inline such warm/medium size callsite.

I suppose the regression comes from iterative-AutoFDO?

Apr 16 2018, 10:51 AM
wmi updated the diff for D45377: [SampleFDO] Don't let inliner treat warm callsite with inline instance in the profile as cold.

Tried David's suggestion and found the tests were good. The original regression for the target benchmark was recovered and we even got a little improvement. Another two server benchmarks had no performance change.

Apr 16 2018, 9:23 AM
wmi added a comment to D45377: [SampleFDO] Don't let inliner treat warm callsite with inline instance in the profile as cold.

Thanks for the fix!

I think maybe a preferred way to fix this is to change SampleProfileLoader::inlineHotFunctions to inline these "warm" inlined callsites early. The current algorithm uses callsiteIsHot, which compares inline instance's total count to the caller's total count, which could be misleading if the caller is super large/hot. A better algorithm should compare inline instance's total count to PSI to get a global hotness. In this way, if the profile annotator thinks a callsite is not hot, the later inliner should *not* even try to inline it. This makes the design cleaner and more stable. WDYT?

Apr 16 2018, 9:13 AM

Apr 11 2018

wmi added inline comments to D45377: [SampleFDO] Don't let inliner treat warm callsite with inline instance in the profile as cold.
Apr 11 2018, 10:00 AM

Apr 6 2018

wmi updated the diff for D45377: [SampleFDO] Don't let inliner treat warm callsite with inline instance in the profile as cold.

Update a comment in the code.

Apr 6 2018, 9:23 AM
wmi created D45377: [SampleFDO] Don't let inliner treat warm callsite with inline instance in the profile as cold.
Apr 6 2018, 9:16 AM

Mar 31 2018

wmi accepted D45127: [ThinLTO] Add an import cutoff for debugging/triaging.

The patch was very helpful for my last triaging of a runtime issue only exposed in thinlto mode. Thanks for contributing it to upstream!

Mar 31 2018, 9:10 PM

Mar 27 2018

wmi accepted D44918: [RegisterCoalescing] Don't move COPY if it would interfere with another value.

LGTM. Thanks for the fix.

Mar 27 2018, 10:12 PM

Mar 16 2018

wmi abandoned D44593: [GlobalAA] Create DeletionCallbackHandle whenever a function is inserted into FunctionInfos.

Just saw Chandler already fixed it. Thanks!

Mar 16 2018, 5:00 PM
wmi created D44593: [GlobalAA] Create DeletionCallbackHandle whenever a function is inserted into FunctionInfos.
Mar 16 2018, 4:57 PM

Mar 7 2018

wmi committed rL326905: [SampleFDO] Extend SampleProfReader to handle demangled names..
[SampleFDO] Extend SampleProfReader to handle demangled names.
Mar 7 2018, 8:51 AM
wmi closed D44161: [SampleFDO] Extend SampleProfReader to handle demangled names.
Mar 7 2018, 8:51 AM

Mar 6 2018

wmi added a comment to D44161: [SampleFDO] Extend SampleProfReader to handle demangled names.

Right, binary format does not have this problem.

Mar 6 2018, 11:12 AM
wmi created D44161: [SampleFDO] Extend SampleProfReader to handle demangled names.
Mar 6 2018, 9:33 AM

Feb 15 2018

wmi added a comment to D42848: Correct dwarf unwind information in function epilogue.

Thanks for working on the patch. We count on it to enable libunwind.

Feb 15 2018, 5:18 PM

Jan 30 2018

wmi added a comment to D42667: SplitKit: Fix liveness recomputation in some remat cases..

Do you guys want a 500kb testcase (that is the size after reducing it with bugpoint) in the repository for this?

Jan 30 2018, 2:24 PM

Jan 23 2018

wmi committed rC323281: Adjust MaxAtomicInlineWidth for i386/i486 targets..
Adjust MaxAtomicInlineWidth for i386/i486 targets.
Jan 23 2018, 3:29 PM
wmi committed rL323281: Adjust MaxAtomicInlineWidth for i386/i486 targets..
Adjust MaxAtomicInlineWidth for i386/i486 targets.
Jan 23 2018, 3:29 PM
wmi closed D42154: Don't generate inline atomics for i386/i486.
Jan 23 2018, 3:29 PM

Jan 22 2018

wmi added a comment to D39053: [Bitfield] Add more cases to making the bitfield a separate location.

Thanks for the size evaluation. I regarded the change as a natural and limited extension to the current fine-grain bitfield access mode, so I feel ok with the change. Hal, what do you think?

Jan 22 2018, 10:44 PM
wmi added a comment to D42154: Don't generate inline atomics for i386/i486.
In D42154#977991, @wmi wrote:

The LLVM backend currently assumes every CPU is Pentium-compatible. If we're going to change that in clang, we should probably fix the backend as well.

With the patch, for i386/i486 targets, clang will generate more atomic libcalls than before, for which llvm backend will not do anything extra, so no fix is necessary in llvm backend for the patch to work.

I think Eli's point is that we do not currently support generating code for the 386 and 486 because there are other things in the x86 backend that assume that the target is at minimum a Pentium. If you're looking to support targeting those chips, you should look into that.

Jan 22 2018, 9:32 AM

Jan 16 2018

wmi added inline comments to D42154: Don't generate inline atomics for i386/i486.
Jan 16 2018, 6:42 PM
wmi added a comment to D42154: Don't generate inline atomics for i386/i486.

The LLVM backend currently assumes every CPU is Pentium-compatible. If we're going to change that in clang, we should probably fix the backend as well.

Jan 16 2018, 6:33 PM
wmi created D42154: Don't generate inline atomics for i386/i486.
Jan 16 2018, 5:40 PM

Nov 13 2017

wmi added a comment to D39053: [Bitfield] Add more cases to making the bitfield a separate location.

I think it may be hard to fix the problem in backend. It will face the same issue of store-to-load forwarding if at some places the transformation happens but at some other places somehow it doesn't.

Nov 13 2017, 11:16 AM

Oct 16 2017

wmi committed rL315915: [Bitfield] Add an option to access bitfield in a fine-grained manner..
[Bitfield] Add an option to access bitfield in a fine-grained manner.
Oct 16 2017, 9:50 AM
wmi closed D36562: [Bitfield] Make the bitfield a separate location if it has width of legal integer type and its bit offset is naturally aligned for the type by committing rL315915: [Bitfield] Add an option to access bitfield in a fine-grained manner..
Oct 16 2017, 9:50 AM

Oct 11 2017

wmi added a comment to D34077: DAGCombine: Combine BUILD_VECTOR to TRUNCATE.

Revert r307036 at r315540 because of PR34919

Oct 11 2017, 5:27 PM
wmi committed rL315540: Revert r307036 because of PR34919..
Revert r307036 because of PR34919.
Oct 11 2017, 5:25 PM
wmi added a comment to rL307036: DAGCombine: Combine BUILD_VECTOR to TRUNCATE.

Hello, we found a bug caused by the patch, could you help to take a look?

Oct 11 2017, 2:41 PM

Oct 8 2017

wmi updated the diff for D36562: [Bitfield] Make the bitfield a separate location if it has width of legal integer type and its bit offset is naturally aligned for the type.

Address Hal's comments.

Oct 8 2017, 10:17 PM

Oct 5 2017

wmi updated the diff for D36562: [Bitfield] Make the bitfield a separate location if it has width of legal integer type and its bit offset is naturally aligned for the type.

Address Hal's comment.

Oct 5 2017, 6:22 PM
wmi added inline comments to D36562: [Bitfield] Make the bitfield a separate location if it has width of legal integer type and its bit offset is naturally aligned for the type.
Oct 5 2017, 6:22 PM

Sep 27 2017

wmi updated the diff for D36562: [Bitfield] Make the bitfield a separate location if it has width of legal integer type and its bit offset is naturally aligned for the type.

Address Hal's comment. Separate bitfields to shards separated by the naturally-sized-and-aligned fields.

Sep 27 2017, 3:54 PM
wmi accepted D37832: Eliminate PHI (int typed) which is only used by inttoptr.
Sep 27 2017, 12:05 PM
wmi added inline comments to D37832: Eliminate PHI (int typed) which is only used by inttoptr.
Sep 27 2017, 10:12 AM

Sep 26 2017

wmi added a comment to D36562: [Bitfield] Make the bitfield a separate location if it has width of legal integer type and its bit offset is naturally aligned for the type.

You seem to be only changing the behavior for the "separatable" fields, but I suspect you want to change the behavior for the others too. The bitfield would be decomposed into shards, separated by the naturally-sized-and-aligned fields. Each access only loads its shard. For example, in your test case you have:

struct S3 {
  unsigned long f1:14;
  unsigned long f2:18;
  unsigned long f3:32;
};

and you test that, with this option, loading/storing to a3.f3 only access the specific 4 bytes composing f3. But if you load f1 or f2, we're still loading all 8 bytes, right? I think we should only load/store the lower 4 bytes when we access a3.f1 and/or a3.f2.

Sep 26 2017, 4:20 PM

Sep 25 2017

wmi committed rL314145: Reinstall the patch "Use EmitPointerWithAlignment to get alignment information….
Reinstall the patch "Use EmitPointerWithAlignment to get alignment information…
Sep 25 2017, 12:59 PM

Sep 22 2017

wmi committed rL313992: [Atomic][X8664] set max atomic inline width according to the target.
[Atomic][X8664] set max atomic inline width according to the target
Sep 22 2017, 9:31 AM
wmi closed D38046: [Atomic][X8664] set max atomic inline/promote width according to the target by committing rL313992: [Atomic][X8664] set max atomic inline width according to the target.
Sep 22 2017, 9:31 AM

Sep 21 2017

wmi updated the diff for D36562: [Bitfield] Make the bitfield a separate location if it has width of legal integer type and its bit offset is naturally aligned for the type.

Changes following the discussion:

Sep 21 2017, 11:29 AM

Sep 20 2017

wmi updated the diff for D38046: [Atomic][X8664] set max atomic inline/promote width according to the target.

Address Eli's comments.

Sep 20 2017, 4:10 PM
wmi added inline comments to D38046: [Atomic][X8664] set max atomic inline/promote width according to the target.
Sep 20 2017, 4:08 PM
wmi added inline comments to D38046: [Atomic][X8664] set max atomic inline/promote width according to the target.
Sep 20 2017, 2:43 PM

Sep 19 2017

wmi updated the diff for D38046: [Atomic][X8664] set max atomic inline/promote width according to the target.

Address Eli's comment.

Sep 19 2017, 6:09 PM
wmi retitled D38046: [Atomic][X8664] set max atomic inline/promote width according to the target from [AtomicExpandPass][X86] set MaxAtomicSizeInBitsSupported according to the target to [Atomic][X8664] set max atomic inline/promote width according to the target.
Sep 19 2017, 6:07 PM
wmi created D38046: [Atomic][X8664] set max atomic inline/promote width according to the target.
Sep 19 2017, 11:23 AM

Sep 15 2017

wmi added inline comments to D37832: Eliminate PHI (int typed) which is only used by inttoptr.
Sep 15 2017, 3:01 PM

Sep 14 2017

wmi added a comment to D18201: Switch over targets to use AtomicExpandPass, and clean up target atomics code..

Any plan to push the patch recently? After https://reviews.llvm.org/rL312830, with better alignment information of atomic object, more atomic load/store are generated for 128 bits atomic object instead of atomic libcalls. Those 128bits atomic load/store are translated into sync_* libcalls on x86-64 target without cmpxchg16b support. This patch is needed for atomicExpandPass to generate atomic libcalls before isel to generate sync_* libcalls.

Sep 14 2017, 5:14 PM

Sep 13 2017

wmi committed rL313199: Add a comment for the test. NFC..
Add a comment for the test. NFC.
Sep 13 2017, 2:48 PM
wmi committed rL313197: [RegAlloc] Keep a copy of live interval for the spilled vregs in….
[RegAlloc] Keep a copy of live interval for the spilled vregs in…
Sep 13 2017, 2:43 PM
wmi closed D37578: [RegAlloc] Keep a copy of live interval for the spilled vregs in HoistSpillHelper by committing rL313197: [RegAlloc] Keep a copy of live interval for the spilled vregs in….
Sep 13 2017, 2:43 PM

Sep 8 2017

wmi committed rL312830: Reinstall the patch "Use EmitPointerWithAlignment to get alignment information….
Reinstall the patch "Use EmitPointerWithAlignment to get alignment information…
Sep 8 2017, 3:00 PM
wmi committed rL312810: Delete empty file test/CodeGenCXX/atomic-align.cpp after the revert at rL312805..
Delete empty file test/CodeGenCXX/atomic-align.cpp after the revert at rL312805.
Sep 8 2017, 11:33 AM
wmi committed rL312805: Revert rL312801 since it generated some calls from libatomic and broke some….
Revert rL312801 since it generated some calls from libatomic and broke some…
Sep 8 2017, 11:11 AM
wmi added a reverting commit for rL312801: Use EmitPointerWithAlignment to get alignment information of the pointer used…: rL312805: Revert rL312801 since it generated some calls from libatomic and broke some….
Sep 8 2017, 11:11 AM
wmi committed rL312801: Use EmitPointerWithAlignment to get alignment information of the pointer used….
Use EmitPointerWithAlignment to get alignment information of the pointer used…
Sep 8 2017, 10:09 AM
wmi closed D37310: [Atomic] Merge alignment information from Decl and from Type when emit atomic expression. by committing rL312801: Use EmitPointerWithAlignment to get alignment information of the pointer used….
Sep 8 2017, 10:09 AM
wmi committed rL312799: Fix a bug for rL312641..
Fix a bug for rL312641.
Sep 8 2017, 9:46 AM

Sep 7 2017

wmi updated the diff for D37310: [Atomic] Merge alignment information from Decl and from Type when emit atomic expression..

Address John's comment.

Sep 7 2017, 6:13 PM
wmi created D37578: [RegAlloc] Keep a copy of live interval for the spilled vregs in HoistSpillHelper.
Sep 7 2017, 11:29 AM

Sep 6 2017

wmi added a comment to D37310: [Atomic] Merge alignment information from Decl and from Type when emit atomic expression..

Ping

Sep 6 2017, 9:40 AM
wmi committed rL312641: [TailCall] Allow llvm.memcpy/memset/memmove to be tail calls when parent.
[TailCall] Allow llvm.memcpy/memset/memmove to be tail calls when parent
Sep 6 2017, 9:06 AM
wmi closed D37406: [TailCall] Allow llvm.memcpy/memset/memmove to be tail calls when parent function return the intrinsics's first argument by committing rL312641: [TailCall] Allow llvm.memcpy/memset/memmove to be tail calls when parent.
Sep 6 2017, 9:06 AM

Sep 1 2017

wmi created D37406: [TailCall] Allow llvm.memcpy/memset/memmove to be tail calls when parent function return the intrinsics's first argument.
Sep 1 2017, 4:53 PM

Aug 30 2017

wmi created D37310: [Atomic] Merge alignment information from Decl and from Type when emit atomic expression..
Aug 30 2017, 2:09 PM
wmi abandoned D37221: [AtomicExpand][X86] Let atomic expand generate inline sequence for unaligned load/store of atomic primitive integer types on x86_64.
Aug 30 2017, 2:03 PM

Aug 29 2017

wmi committed rL312045: [LoopUnswitch] Fix a simple bug which disables loop unswitch for select….
[LoopUnswitch] Fix a simple bug which disables loop unswitch for select…
Aug 29 2017, 2:46 PM
wmi closed D36985: [LoopUnswitch] Fix a simple bug which disables loop unswitch for select statement by committing rL312045: [LoopUnswitch] Fix a simple bug which disables loop unswitch for select….
Aug 29 2017, 2:46 PM

Aug 28 2017

wmi added a comment to D36985: [LoopUnswitch] Fix a simple bug which disables loop unswitch for select statement.

Ping.

Aug 28 2017, 5:08 PM
wmi created D37221: [AtomicExpand][X86] Let atomic expand generate inline sequence for unaligned load/store of atomic primitive integer types on x86_64.
Aug 28 2017, 10:36 AM