wmi (Wei Mi)
User

Projects

User does not belong to any projects.

User Details

User Since
Feb 20 2015, 10:57 AM (117 w, 3 d)

Recent Activity

Fri, May 19

wmi added a comment to D31821: Remove redundant copy in recurrences.

Hi Taewook,

Fri, May 19, 12:04 PM
wmi updated the diff for D32252: [GVN] Add phi-translate for scalarpre as a temporary solution.

Initialize commutative in Expression's constructor.
Fix a bug related with commutative: need to swap cmp predicate at the same time when we swap the operands.

Fri, May 19, 10:01 AM

Thu, May 18

wmi committed rL303361: [LSR] Call canonicalize after we generate a new Formula in GenerateTruncates..
[LSR] Call canonicalize after we generate a new Formula in GenerateTruncates.
Thu, May 18, 10:34 AM

Tue, May 16

wmi added a comment to D30416: [BitfieldShrinking] Shrink Bitfields load/store when the bitfields are legal to access independently.

Discussed with Chandler offline, and we decided to split the patch and tried to commit the store shrinking first.

Tue, May 16, 5:20 PM

Sun, May 14

wmi added a comment to D33164: [Profile[ Enhance expect lowering to handle correlated branches.

Overall looks good. Some minor comments inlined.

Sun, May 14, 5:37 PM

Thu, May 11

wmi added inline comments to D30416: [BitfieldShrinking] Shrink Bitfields load/store when the bitfields are legal to access independently.
Thu, May 11, 7:26 AM

Wed, May 10

wmi added inline comments to D30416: [BitfieldShrinking] Shrink Bitfields load/store when the bitfields are legal to access independently.
Wed, May 10, 10:37 PM
wmi added a comment to D30416: [BitfieldShrinking] Shrink Bitfields load/store when the bitfields are legal to access independently.

Chandler, Thanks for the comments. They are very helpful. I will address them in the next revision. I only replied some comments which I had questions or concerns.

Wed, May 10, 10:43 AM

Tue, May 9

wmi updated the diff for D32252: [GVN] Add phi-translate for scalarpre as a temporary solution.

I change the DenseMap numberingExpression (mapping from value numbering to GVN::Expression) to std::vector. Although there is still some minor increase but better than before.

Tue, May 9, 3:59 PM

Fri, May 5

wmi updated the diff for D32252: [GVN] Add phi-translate for scalarpre as a temporary solution.

Forgot to update the patch.

Fri, May 5, 9:42 AM

Tue, May 2

wmi added a comment to D32563: Add LiveRangeShrink pass to shrink live range within BB..

Ideally there should be a separate pass that runs on the SSA machine code (before register coalescing) to minimize register pressure and hide latency for chains of loads or FP ops. It should work across calls and loop boundaries. You could even do a kind of poor-man's cyclic scheduling this way.

MISched works at the lower level of instructions groups and CPU pipeline hazards. It would be nice if MISched worked at the level of extended basic blocks (it would be easy to implement and has been done out of tree). I don't think it makes as much sense for it to work across call sites though. That is not hard to implement but seems it will generate large DAGS and will be bad compile-time tradeoff.

MISched is not a scheduling algorithm, it's a scheduling framework. The generic scheduler is a pile of heuristics that exercise most of the functionality and seems to be working ok for several popular targets. The strategy that it takes is:

  • Make a single scheduling pass handling all heuristics at once. Don't reorder the instructions at all unless the heuristics identify a register pressure or latency problem.
  • Try to determine, before scheduling a block, whether register pressure or latency is likely to become a problem. This avoids the scheduler backing itself into a corner (we don't want the scheduler to backtrack).

    You'll notice that this is very conservative with respect to managing compile time and preserving the decisions made by earlier passes.

    You could follow that basic strategy and simply adjust the priority of heuristics for your target. You can add better up-front analysis to detect code patterns for each block before prioritizing heuristics. Or you could implement a completely different strategy. For example, schedule for register pressure first, then reschedule for ILP.

    I probably won't be able to help you must more than that. Matthias has taken over maintenance of MISched. I think it would help if you give more background on your situation (sorry if I haven't paid attention or have forgotten). Is this PowerPC? Inorder/out-of-order?
Tue, May 2, 5:12 PM
wmi added a comment to D32563: Add LiveRangeShrink pass to shrink live range within BB..

Why are the adds "sunk down" in the first place? Is this reassociation at work?

Tue, May 2, 2:31 PM

Mon, May 1

wmi added inline comments to D32563: Add LiveRangeShrink pass to shrink live range within BB..
Mon, May 1, 4:58 PM
wmi updated subscribers of D32563: Add LiveRangeShrink pass to shrink live range within BB..

+ Andy, for the history on pre-RA-sched and misched.

Mon, May 1, 4:43 PM

Fri, Apr 28

wmi updated the diff for D30416: [BitfieldShrinking] Shrink Bitfields load/store when the bitfields are legal to access independently.

Address Eli, Matt and Chandler's comments.

Fri, Apr 28, 4:20 PM
wmi added inline comments to D30416: [BitfieldShrinking] Shrink Bitfields load/store when the bitfields are legal to access independently.
Fri, Apr 28, 4:11 PM

Wed, Apr 26

wmi accepted D32249: [PartialInl] Enhance partial inliner to handle more complex conditions.

LGTM.

Wed, Apr 26, 1:52 PM
wmi added a comment to D32249: [PartialInl] Enhance partial inliner to handle more complex conditions.

Some minor comments.

Wed, Apr 26, 10:47 AM

Tue, Apr 25

wmi added a comment to D30416: [BitfieldShrinking] Shrink Bitfields load/store when the bitfields are legal to access independently.

Thanks for drafting the comments. It is apparently more descriptive and clearer, and I like the varnames -- (LargeVal and SmallVal), which are much better than what I used -- (OrigVal, MaskedVal). I will rewrite the comments based on your draft.

Tue, Apr 25, 10:29 AM

Apr 21 2017

wmi added a comment to D30416: [BitfieldShrinking] Shrink Bitfields load/store when the bitfields are legal to access independently.

Thanks for bearing with my poor English. I will fix the terminologies and comments according to your suggestions.

Apr 21 2017, 10:11 PM
wmi added inline comments to D30416: [BitfieldShrinking] Shrink Bitfields load/store when the bitfields are legal to access independently.
Apr 21 2017, 6:07 PM
wmi added inline comments to D30416: [BitfieldShrinking] Shrink Bitfields load/store when the bitfields are legal to access independently.
Apr 21 2017, 4:31 PM
wmi added a comment to D30416: [BitfieldShrinking] Shrink Bitfields load/store when the bitfields are legal to access independently.

Ping.

Apr 21 2017, 10:16 AM
wmi committed rL300989: [ConstHoisting] Add BFI in constanthoisting pass and select the best insertion.
[ConstHoisting] Add BFI in constanthoisting pass and select the best insertion
Apr 21 2017, 9:03 AM
wmi closed D28962: Add BFI in constanthoisting pass and do the hoisting selectively by committing rL300989: [ConstHoisting] Add BFI in constanthoisting pass and select the best insertion.
Apr 21 2017, 9:03 AM

Apr 20 2017

wmi accepted D32308: Use BasicBlock Util SplitBlock interface to update DT .

LGTM.

Apr 20 2017, 2:06 PM
wmi added a comment to D32249: [PartialInl] Enhance partial inliner to handle more complex conditions.

I am thinking of another two issues:

  1. Is it possible that some of the BBs on the chain may be very big and we don't want to partial inline them?
  2. The existing pattern handles if (a || b || c ...) case, but it may not be easy to extend for cases like (a && b && c) and ((a && b) || c). Basically, we want to find a collection of bbs with small sizes starting from entry. The bb collection only have two exits. one of them is ReturnBlock and the other is NonReturnBlock. The edges inside of the collection can be of any pattern.
Apr 20 2017, 10:18 AM

Apr 19 2017

wmi added a comment to D32252: [GVN] Add phi-translate for scalarpre as a temporary solution.

Daniel, thanks for the comments.

Apr 19 2017, 10:41 PM
wmi added inline comments to D32249: [PartialInl] Enhance partial inliner to handle more complex conditions.
Apr 19 2017, 4:58 PM
wmi created D32252: [GVN] Add phi-translate for scalarpre as a temporary solution.
Apr 19 2017, 3:45 PM

Apr 18 2017

wmi created D32201: [RALLOC] Increase CSR cost in RegAllocGreedy to favour splitting over CSR first use.
Apr 18 2017, 5:02 PM

Apr 17 2017

wmi added a comment to D32037: Change the testcase tail-merge-after-mbp.ll to tail-merge-after-mbp.mir.

Haicheng, the testcase LGTM. Thanks for working on it!

Apr 17 2017, 2:45 PM
wmi committed rL300499: Fix an unused variable error in rL300494..
Fix an unused variable error in rL300494.
Apr 17 2017, 2:13 PM
wmi committed rL300494: [SCEV] Add a local cache for getZeroExtendExpr and getSignExtendExpr to prevent.
[SCEV] Add a local cache for getZeroExtendExpr and getSignExtendExpr to prevent
Apr 17 2017, 1:52 PM
wmi closed D30350: [LSR] Add a cap for reassociation of AllFixupsOutsideLoop type LSRUse to protect compile time by committing rL300494: [SCEV] Add a local cache for getZeroExtendExpr and getSignExtendExpr to prevent.
Apr 17 2017, 1:52 PM
wmi added a comment to D30350: [LSR] Add a cap for reassociation of AllFixupsOutsideLoop type LSRUse to protect compile time.

Sanjoy, thanks for all the helpful comments.

Apr 17 2017, 10:52 AM

Apr 13 2017

wmi updated the diff for D32037: Change the testcase tail-merge-after-mbp.ll to tail-merge-after-mbp.mir.

Remove some code from the test that doesn't impact it.

Apr 13 2017, 1:54 PM
wmi created D32037: Change the testcase tail-merge-after-mbp.ll to tail-merge-after-mbp.mir.
Apr 13 2017, 11:23 AM
wmi updated the diff for D30350: [LSR] Add a cap for reassociation of AllFixupsOutsideLoop type LSRUse to protect compile time.

Address Sanjoy's comments: Add another two proxy functions: getZeroExtendExprCached and getSignExtendExprCached.

Apr 13 2017, 10:00 AM

Apr 12 2017

wmi updated the diff for D30350: [LSR] Add a cap for reassociation of AllFixupsOutsideLoop type LSRUse to protect compile time.

Address Sanjoy's comments: Use SmallDenseMap and lambda.

Apr 12 2017, 6:12 PM
wmi updated the diff for D30350: [LSR] Add a cap for reassociation of AllFixupsOutsideLoop type LSRUse to protect compile time.

Address Sanjoy's comments.

  • Implement local caches for getZeroExtendExpr and getSignExtendExpr.
  • Generate the IR programatically for the testcase.
Apr 12 2017, 11:09 AM

Apr 10 2017

wmi added a comment to D30350: [LSR] Add a cap for reassociation of AllFixupsOutsideLoop type LSRUse to protect compile time.

Sanjoy, thanks for the comments.

Apr 10 2017, 2:11 PM

Apr 5 2017

wmi added a comment to D23191: [BranchFolding] Restrict tail merging loop blocks after machine block placement.

I am running into a problem about the testcase tail-merge-after-mbp.ll. I am working on a patch related with register allocation and it somehow triggers the tail merge before block placement, and breaks the checks in the test. The tail merge triggered is doing in exactly the same way as the test describes.

Apr 5 2017, 11:08 AM
wmi added inline comments to D31679: Use PMADDWD to expand reduction in a loop.
Apr 5 2017, 10:06 AM

Apr 4 2017

wmi added inline comments to D31679: Use PMADDWD to expand reduction in a loop.
Apr 4 2017, 3:46 PM
wmi updated the diff for D30350: [LSR] Add a cap for reassociation of AllFixupsOutsideLoop type LSRUse to protect compile time.

I extended the test and now it took more than one hour on my sandybridge machine when built with clang in release mode.
I added early returns in getZeroExtendExpr/getSignExtendExpr for SCEVAddRecExpr with NW flag. Like the test shows, the compile explosion can only happen when the step of SCEVAddRecExpr is negative and NW flag can be marked. With the change, the test now takes less than one second.

Apr 4 2017, 2:08 PM

Apr 3 2017

wmi updated the diff for D30416: [BitfieldShrinking] Shrink Bitfields load/store when the bitfields are legal to access independently.
  • Address Chandler's comments.
  • Fix unittest errors.
  • Add unittest for load shrinking part. Add the original motivation case as a unittest.
  • Add cost evaluation for the case when there is multiple use node inside the shrinking pattern.
Apr 3 2017, 4:44 PM
wmi added a comment to D30416: [BitfieldShrinking] Shrink Bitfields load/store when the bitfields are legal to access independently.

Chandler, thanks for the review and sorry about the delay of replying. It takes me a while to fix some issues of the patch found when I was adding test for the load shrinking part and doing the unittest.

Apr 3 2017, 4:07 PM

Mar 25 2017

wmi added a comment to D30350: [LSR] Add a cap for reassociation of AllFixupsOutsideLoop type LSRUse to protect compile time.

Quentin, it is really a good finding, thanks a lot! I was cheated by the large amount of reassociation candidates and I have verified the non-linear increase of compile time is indeed because of SCEVExpand!

Mar 25 2017, 5:53 PM

Mar 24 2017

wmi retitled D30416: [BitfieldShrinking] Shrink Bitfields load/store when the bitfields are legal to access independently from [InstCombine] Redo reduceLoadOpStoreWidth in instcombine for bitfield store optimization. to [BitfieldShrinking] Shrink Bitfields load/store when the bitfields are legal to access independently.
Mar 24 2017, 3:57 PM
wmi updated the diff for D30416: [BitfieldShrinking] Shrink Bitfields load/store when the bitfields are legal to access independently.

Revamp the patch.

Mar 24 2017, 3:53 PM

Mar 8 2017

wmi added a comment to D30350: [LSR] Add a cap for reassociation of AllFixupsOutsideLoop type LSRUse to protect compile time.

Thanks for the comment.

Mar 8 2017, 2:43 PM

Mar 6 2017

wmi added a comment to D30416: [BitfieldShrinking] Shrink Bitfields load/store when the bitfields are legal to access independently.

Could you add a test-case like this?

Sure. I will add such testcase after other major issues are solved.

Mar 6 2017, 9:03 PM

Mar 4 2017

wmi added a comment to D30416: [BitfieldShrinking] Shrink Bitfields load/store when the bitfields are legal to access independently.

Although I did't find regression in internal benchmarks testing, I still moved the transformation to codegenprepare because we want to use TargetLowering information to decide how to shrink in some cases.

Mar 4 2017, 12:49 PM
wmi updated the diff for D30416: [BitfieldShrinking] Shrink Bitfields load/store when the bitfields are legal to access independently.

Update patch according to Eli's comments.

Mar 4 2017, 12:46 PM

Feb 27 2017

wmi added inline comments to D30416: [BitfieldShrinking] Shrink Bitfields load/store when the bitfields are legal to access independently.
Feb 27 2017, 10:02 PM
wmi added inline comments to D30416: [BitfieldShrinking] Shrink Bitfields load/store when the bitfields are legal to access independently.
Feb 27 2017, 5:14 PM
wmi added a comment to D30416: [BitfieldShrinking] Shrink Bitfields load/store when the bitfields are legal to access independently.

Your testcase has dead instructions (%conv).

Will fix it.

Feb 27 2017, 3:01 PM
wmi updated the diff for D30350: [LSR] Add a cap for reassociation of AllFixupsOutsideLoop type LSRUse to protect compile time.

Fixed a typo in the testcase.

Feb 27 2017, 2:26 PM
wmi updated the diff for D30350: [LSR] Add a cap for reassociation of AllFixupsOutsideLoop type LSRUse to protect compile time.

Add a testcase by checking the LSR debug output.

Feb 27 2017, 2:24 PM
wmi created D30416: [BitfieldShrinking] Shrink Bitfields load/store when the bitfields are legal to access independently.
Feb 27 2017, 10:39 AM
wmi added a comment to D30350: [LSR] Add a cap for reassociation of AllFixupsOutsideLoop type LSRUse to protect compile time.

I don't like a hanging test to be killed after timeout. I will try to create another testcase by checking the LSR trace.

Feb 27 2017, 9:53 AM

Feb 24 2017

wmi created D30350: [LSR] Add a cap for reassociation of AllFixupsOutsideLoop type LSRUse to protect compile time.
Feb 24 2017, 1:41 PM

Feb 22 2017

wmi committed rL295884: [LSR] Canonicalize formula and put recursive Reg related with current loop in….
[LSR] Canonicalize formula and put recursive Reg related with current loop in…
Feb 22 2017, 1:58 PM
wmi closed D26781: [LSR] Canonicalize formula and put recursive Reg related with current loop in ScaledReg. by committing rL295884: [LSR] Canonicalize formula and put recursive Reg related with current loop in….
Feb 22 2017, 1:58 PM

Feb 16 2017

wmi committed rL295378: [LSR] Prevent formula with SCEVAddRecExpr type of Reg from Sibling loops.
[LSR] Prevent formula with SCEVAddRecExpr type of Reg from Sibling loops
Feb 16 2017, 1:39 PM
wmi closed D30021: [LSR] Prevent formula with SCEVAddRecExpr type of Reg from Sibling loops by committing rL295378: [LSR] Prevent formula with SCEVAddRecExpr type of Reg from Sibling loops.
Feb 16 2017, 1:39 PM

Feb 15 2017

wmi created D30021: [LSR] Prevent formula with SCEVAddRecExpr type of Reg from Sibling loops.
Feb 15 2017, 6:44 PM

Feb 10 2017

wmi committed rL294814: [LSR] Recommit: Allow formula containing Reg for SCEVAddRecExpr related with….
[LSR] Recommit: Allow formula containing Reg for SCEVAddRecExpr related with…
Feb 10 2017, 5:02 PM
wmi added a comment to D27366: [PowerPC][WIP] Provide context-sensitive cost to the Greedy Allocator to favour splitting over CSR first use.

Hi Nemanja,

Feb 10 2017, 2:59 PM
wmi abandoned D27596: Add a PreRASplit pass to enable more shrinkwrap.

Sorry, wrong reply. Drop the patch and try to push https://reviews.llvm.org/D27366

Feb 10 2017, 2:58 PM
wmi added a comment to D27596: Add a PreRASplit pass to enable more shrinkwrap.

Hi Nemanja,

Feb 10 2017, 2:54 PM

Feb 8 2017

wmi added a comment to D27366: [PowerPC][WIP] Provide context-sensitive cost to the Greedy Allocator to favour splitting over CSR first use.

nemanjai, do you have any update about the patch? We also depend on it on x86 side so want to know the status.

Feb 8 2017, 11:21 AM

Feb 7 2017

wmi updated the diff for D26429: [LSR] Allow formula containing Reg for SCEVAddRecExpr with loop other than current loop.

First, sorry to revisit the patch after long time.

Feb 7 2017, 5:38 PM

Feb 6 2017

wmi accepted D29611: RegisterCoalescer: Fix joinReservedPhysReg().

LGTM with A nitpick.

Feb 6 2017, 5:24 PM
wmi added a comment to D29436: RegisterCoalescer: Fix joinReservedPhysReg().

Sorry to chime in late.

Feb 6 2017, 3:19 PM

Feb 5 2017

wmi updated the diff for D28962: Add BFI in constanthoisting pass and do the hoisting selectively.

Make change to save compile time and add assertion per David's suggestions.

Feb 5 2017, 11:16 PM
wmi added inline comments to D28962: Add BFI in constanthoisting pass and do the hoisting selectively.
Feb 5 2017, 11:01 PM

Feb 3 2017

wmi added inline comments to D28962: Add BFI in constanthoisting pass and do the hoisting selectively.
Feb 3 2017, 5:16 PM

Feb 2 2017

wmi updated the diff for D28962: Add BFI in constanthoisting pass and do the hoisting selectively.

Remove MadeChange. We will still need to insert const materialization code even if there is no hoisting or merging for BBs.

Feb 2 2017, 3:13 PM

Feb 1 2017

wmi added inline comments to D28962: Add BFI in constanthoisting pass and do the hoisting selectively.
Feb 1 2017, 5:12 PM

Jan 30 2017

wmi updated the diff for D28962: Add BFI in constanthoisting pass and do the hoisting selectively.

Addressed David's comments.

Jan 30 2017, 9:54 AM

Jan 27 2017

wmi added inline comments to D28962: Add BFI in constanthoisting pass and do the hoisting selectively.
Jan 27 2017, 4:53 PM
wmi updated the diff for D28962: Add BFI in constanthoisting pass and do the hoisting selectively.

Update the patch according to suggestion from David: using BFI to find the optimal solution to the insertion problem.

Jan 27 2017, 11:33 AM

Jan 24 2017

wmi committed rL292984: Revert rL292621. Caused some internal build bot failures in apple..
Revert rL292621. Caused some internal build bot failures in apple.
Jan 24 2017, 2:26 PM
wmi added a reverting commit for rL292621: [RegisterCoalescing] Recommit the patch "Remove partial redundent copy".: rL292984: Revert rL292621. Caused some internal build bot failures in apple..
Jan 24 2017, 2:26 PM

Jan 22 2017

wmi added inline comments to D28962: Add BFI in constanthoisting pass and do the hoisting selectively.
Jan 22 2017, 11:22 AM

Jan 20 2017

wmi created D28962: Add BFI in constanthoisting pass and do the hoisting selectively.
Jan 20 2017, 12:02 PM
wmi committed rL292621: [RegisterCoalescing] Recommit the patch "Remove partial redundent copy"..
[RegisterCoalescing] Recommit the patch "Remove partial redundent copy".
Jan 20 2017, 9:50 AM

Jan 17 2017

wmi committed rL292327: Revert rL292292 since it causes a SEGV on sanitizer-x86_64-linux-fuzzer build….
Revert rL292292 since it causes a SEGV on sanitizer-x86_64-linux-fuzzer build…
Jan 17 2017, 6:05 PM
wmi added a reverting commit for rL292292: [RegisterCoalescing] Remove partial redundent copy.: rL292327: Revert rL292292 since it causes a SEGV on sanitizer-x86_64-linux-fuzzer build….
Jan 17 2017, 6:05 PM
wmi committed rL292292: [RegisterCoalescing] Remove partial redundent copy..
[RegisterCoalescing] Remove partial redundent copy.
Jan 17 2017, 3:50 PM
wmi closed D28585: [RegisterCoalescing] Remove partial redundent copy by committing rL292292: [RegisterCoalescing] Remove partial redundent copy..
Jan 17 2017, 3:50 PM

Jan 13 2017

wmi updated the diff for D28585: [RegisterCoalescing] Remove partial redundent copy.

Address Noel, Quentin and Matthias's comments.

Jan 13 2017, 12:50 PM
wmi added a comment to D28585: [RegisterCoalescing] Remove partial redundent copy.

Thanks for the review.

Jan 13 2017, 12:48 PM

Jan 11 2017

wmi retitled D28585: [RegisterCoalescing] Remove partial redundent copy from to [RegisterCoalescing] Remove partial redundent copy.
Jan 11 2017, 3:43 PM

Dec 22 2016

wmi committed rL290365: Redo store splitting in CodeGenPrepare..
Redo store splitting in CodeGenPrepare.
Dec 22 2016, 11:55 AM
wmi closed D25914: Redo store splitting in CodeGenPrepare by committing rL290365: Redo store splitting in CodeGenPrepare..
Dec 22 2016, 11:55 AM
wmi committed rL290363: Change the interface of TLI.isMultiStoresCheaperThanBitsMerge..
Change the interface of TLI.isMultiStoresCheaperThanBitsMerge.
Dec 22 2016, 11:49 AM
wmi closed D24707: Change the interface of TLI.isMultiStoresCheaperThanBitsMerge by committing rL290363: Change the interface of TLI.isMultiStoresCheaperThanBitsMerge..
Dec 22 2016, 11:49 AM