mkuper (Michael Kuperstein)
User

Projects

User does not belong to any projects.

User Details

User Since
Nov 5 2014, 12:57 AM (210 w, 5 d)

Recent Activity

Dec 6 2017

mkuper accepted D40617: [LV] Interleaved access vectorization: fix computing new alias info.

LGTM

Dec 6 2017, 1:15 PM
mkuper added inline comments to D40617: [LV] Interleaved access vectorization: fix computing new alias info.
Dec 6 2017, 11:53 AM

Nov 6 2017

mkuper added a reviewer for D39636: [X86] Don't clobber reserved registers with stack adjustments: rnk.

I'm not sure whether it's the right fix.
I would expect the clobbers list of an instruction to be precise - if it says it's clobbered, you can't rely on it being preserved across the call, whatever the circumstances. So maybe the right thing to do would be to filter reserved registers out of the clobber list for these invokes when they're lowered?

Nov 6 2017, 1:57 PM

Oct 16 2017

mkuper added a comment to D38676: [LV] Model masking in VPlan, introducing VPInstructions.

A minor nit but LGTM. I don't have an aversion to "dead code" if it's going to be used in the near future, so perhaps to make the patch smaller split it into an initial patch to add VPlanValue.h and then the VPlanBuilder.h as well as the changes to the documentation. Those pieces seem uncontroversial to me, before moving onto the vectorizer. @mkuper does this make sense?

Oct 16 2017, 2:42 PM

Sep 5 2017

mkuper edited reviewers for D37473: [X86][AsmParser] fix PR32035, added: craig.topper; removed: mkuper.
Sep 5 2017, 10:30 AM

Aug 9 2017

mkuper added a reviewer for D36130: [SLP] Vectorize jumbled memory loads.: danielcdh.
Aug 9 2017, 9:17 AM
mkuper added reviewers for D36130: [SLP] Vectorize jumbled memory loads.: Ayal, zvi.
Aug 9 2017, 9:15 AM

Aug 8 2017

mkuper accepted D32871: [LV] Using VPlan to model the vectorized code and drive its transformation.

I think the revision summary doesn't fully represent what's going on here at this point. :-)

Aug 8 2017, 1:45 AM

Jul 27 2017

mkuper added inline comments to D30200: [SLP] Fix for PR31880: shuffle and vectorize repeated scalar ops on extracted elements.
Jul 27 2017, 2:45 PM

Jul 18 2017

mkuper added a comment to D35227: [LV] Don't allow outside uses of IVs if the SCEV is predicated on loop conditions.

Thanks, Gil!

Jul 18 2017, 11:00 AM
mkuper added a comment to D35498: [LoopVectorizer] Use two step casting for float to pointer types..

Anyway, I agree this isn't the right fix, but my gut feeling is that the right fix is to actually allow the builder to create a bitcast between a pointer and a pointer-sized float.
I don't see anything in the langref that makes the semantics of this undefined. Renato, what kind of semantic issues do you see here?

I agree the vectorizer can "safely" assume this case is ok because we know the original datatype was a pointer anyway and this is part of a strided load, but I feel we'd be opening a can of worms if we start allowing any float<->pointer conversion by default.

For example, a different case would be pointer->float->double->pointer. C has automatic promotions, and some corner cases may slip and create that sequence, which would destroy the bit-pattern and therefore the memory address.

So, if we can do this in this specific case, Manoj's current fix is "better" than moving it up CreateBitOrPointerCast, because we know what the semantics is. Or maybe we just load them as "data" (i32/i64?) and then bitcast safely?

Makes sense?

Jul 18 2017, 10:29 AM

Jul 17 2017

mkuper added a comment to D35498: [LoopVectorizer] Use two step casting for float to pointer types..

Should I update the Builder's CreateBitOrPointerCast function to handle float to pointer casts using float -> int and int -> pointer casts? I can do that or I can create a local pass specific casting function to handle float -> ptr + other CreateBitOrPointerCast routines used here.

Why can't you do the bitcast directly inside CreateBitOrPointerCast()? You'll also want to upadte isBitOrNoopPointerCastable(),
Anyway, you probably want a different set of reviewers for that patch - I'm really not the authority on that, and at least Renato seems to have a completely different opinion.

Jul 17 2017, 3:33 PM
mkuper added a comment to D35488: [LoopInterchange] Split up interchange.ll test case (NFC)..

We don't really have a policy on this either way, I think.
I certainly don't have any strong opinions about it.

Jul 17 2017, 2:00 PM
mkuper added a comment to D35498: [LoopVectorizer] Use two step casting for float to pointer types..

Setting aside from the fact that no one should ever be casting floating point to pointers, perhaps we shouldn't be even trying to do this transformation here, as I'm not sure this could have a guaranteed semantics in this case, and probably better to just bail the vectorisation?

Jul 17 2017, 1:53 PM

Jul 12 2017

mkuper committed rL307837: [LV] Don't allow outside uses of IVs if the SCEV is predicated on loop….
[LV] Don't allow outside uses of IVs if the SCEV is predicated on loop…
Jul 12 2017, 12:54 PM
mkuper closed D35227: [LV] Don't allow outside uses of IVs if the SCEV is predicated on loop conditions by committing rL307837: [LV] Don't allow outside uses of IVs if the SCEV is predicated on loop….
Jul 12 2017, 12:54 PM

Jul 11 2017

mkuper requested changes to D35264: [LICM] Teach LICM to hoist conditional loads.

This looks wrong.
In particular, it would break for something like:

Jul 11 2017, 11:23 AM
mkuper added a comment to D35227: [LV] Don't allow outside uses of IVs if the SCEV is predicated on loop conditions.

Hi Michael,

I'm probably missing something obvious here. For external IV uses, we compute the end IV value assuming we're coming from the vector loop. But if we executed the vector loop, shouldn't the SCEV predicate have been true, which would mean that the PSEV was a safe assumption?

Jul 11 2017, 10:58 AM
mkuper accepted D34150: [LV] Test once if vector trip count is zero, instead of twice.

Thanks, Wei.
LGTM.

Jul 11 2017, 10:39 AM

Jul 10 2017

mkuper created D35227: [LV] Don't allow outside uses of IVs if the SCEV is predicated on loop conditions.
Jul 10 2017, 4:26 PM

Jul 9 2017

mkuper added a reviewer for D34150: [LV] Test once if vector trip count is zero, instead of twice: wmi.

Re overflow - the point is that getOrCreateTripCount() returns, basically, PSE.getBackedgeTakenCount() + 1, and that may overflow, so the "trip count" may end up being 0 if the backedge taken count is 0. I don't think this is outdated, and this is behavior we want to preserve. But this patch should preserve this behavior IIUC. Can you make sure there's a test for this?

Jul 9 2017, 6:13 AM

Jul 6 2017

mkuper added a comment to D34555: [NVPTX] Add lowering of i128 params..

I've reverted this in r307334 because it breaks a lot of clang tests with:

Jul 6 2017, 4:25 PM
mkuper committed rL307334: Reverting r307326 because it breaks clang tests..
Reverting r307326 because it breaks clang tests.
Jul 6 2017, 4:25 PM
mkuper committed rL307326: [NVPTX] Add lowering of i128 params..
[NVPTX] Add lowering of i128 params.
Jul 6 2017, 3:19 PM
mkuper closed D34555: [NVPTX] Add lowering of i128 params. by committing rL307326: [NVPTX] Add lowering of i128 params..
Jul 6 2017, 3:19 PM

Jun 28 2017

mkuper accepted D34756: [SLPVectorizer] Introducing getTreeEntry() [NFC].
Jun 28 2017, 11:32 AM
mkuper added inline comments to D34756: [SLPVectorizer] Introducing getTreeEntry() [NFC].
Jun 28 2017, 10:58 AM

Jun 26 2017

mkuper accepted D34473: [LV] Changing the interface of ValueMap.

LGTM, but please wait for Matt.

Jun 26 2017, 8:42 AM

Jun 25 2017

mkuper added a comment to D28907: [SLP] Fix for PR30787: Failure to beneficially vectorize 'copyable' elements in integer binary ops..

This patch is proving difficult to split into smaller chunks.

@mkuper do you have any suggestions on what can be done to get it done in time for Clang 5.00?

Jun 25 2017, 11:05 PM

Jun 23 2017

mkuper added a comment to D34473: [LV] Changing the interface of ValueMap.

Michael, do you have any issues with moving to this new interface for the ValueMap?

Makes sense to me, modulo a couple of nits.

Jun 23 2017, 4:32 PM

Jun 19 2017

mkuper edited reviewers for D34078: Enable support for floating-point division reductions, added: jmolloy; removed: llvm-commits.
Jun 19 2017, 5:38 PM

Jun 9 2017

mkuper added inline comments to D32871: [LV] Using VPlan to model the vectorized code and drive its transformation.
Jun 9 2017, 6:21 PM

Jun 4 2017

mkuper added a comment to D32871: [LV] Using VPlan to model the vectorized code and drive its transformation.

Sorry it's taking me so long to get to this.
I've only looked at parts of the code, more forthcoming in the next few days. :-)

Jun 4 2017, 7:01 PM

Jun 2 2017

mkuper added reviewers for D32451: Improve profile-guided heuristics to use estimated trip count.: danielcdh, wmi.

This LGTM in terms of being "the right thing to do", but I'm really not sure whether the branch information we have right now is precise enough right now to enable this, especially when using sampling-based fdo.
wmi/dehao, any chance you could check if this causes regressions for us?

Jun 2 2017, 10:28 AM

May 11 2017

mkuper added a reviewer for D33097: [x86] Follow-up to r302785 - don't widen vselect conditions when the vNi1 type being split into is a legal type (like it usually is w/ AVX-512).: zvi.
May 11 2017, 10:33 AM

May 9 2017

mkuper accepted D32245: Add an IR expansion pass for the experimental reductions.

LGTM, with a nit.

May 9 2017, 3:38 PM

May 8 2017

mkuper accepted D32969: [LV] Fix insertion point for shuffle vectors in first order recurrence.

LGTM

May 8 2017, 3:27 PM

Apr 26 2017

mkuper added inline comments to D32245: Add an IR expansion pass for the experimental reductions.
Apr 26 2017, 10:45 AM
mkuper added a comment to D32533: [SLPVectorizer] Limit the number of block chain instructions to max register size.
In D32533#738239, @anna wrote:

okay, that explains why we do limit to the target vector register *only* for the store chain in SLP (compile time benefit). However, if we look at the Loop Vectorizer, we consider the maximum vector register size when generating the code in the IR. This also gives a more accurate cost model for LV.

Not considering the physical vector register size is limiting the SLP cost model right? For example, in the target, we would have 4 shuffles instead of a single shuffle.

Apr 26 2017, 10:37 AM

Apr 25 2017

mkuper added reviewers for D32245: Add an IR expansion pass for the experimental reductions: delena, rengolin.

Adding some target people - I think they ought to care about this more than I do. :-)

Apr 25 2017, 6:10 PM
mkuper accepted D32445: [LV] Handle external uses of floating-point induction variables.

Ok, for fast-math, I think this makes sense.
LGTM.

Apr 25 2017, 2:01 PM
mkuper added a comment to D32404: [LV] Make LIT test insensitive to basic block numbering.

Honestly, I don't feel strongly about this.

Apr 25 2017, 10:42 AM

Apr 24 2017

mkuper added a comment to D32451: Improve profile-guided heuristics to use estimated trip count..
In D32451#735969, @twoh wrote:

@mkuper This helps with our internal benchmarks, but I don't have numbers with public benchmarks. What would be the good benchmarks to evaluate the vectorization? Thanks!

Apr 24 2017, 2:06 PM
mkuper added a comment to D32451: Improve profile-guided heuristics to use estimated trip count..

When I tried to play with this kind of thing, I got rather inconsistent performance results, because of things like a cold loop within a relatively hot loop.
Do you have any performance numbers for this?

Apr 24 2017, 1:56 PM
mkuper added a comment to D32445: [LV] Handle external uses of floating-point induction variables.

I'm not sure. Do we currently (I mean, without this patch) do anything with FP IVs that violates spec?

Apr 24 2017, 1:39 PM
mkuper added a comment to D32445: [LV] Handle external uses of floating-point induction variables.

Yes, that should have been "trip count - 1", sorry.

Apr 24 2017, 11:25 AM
mkuper added a comment to D32445: [LV] Handle external uses of floating-point induction variables.

The original PR looks fishy to me, but I agree this is a real issue regardless.

Apr 24 2017, 11:12 AM
mkuper accepted D32224: [LV] Remove redundant basic block split.

Makes sense. LGTM.

Apr 24 2017, 10:30 AM

Apr 20 2017

mkuper added a comment to D32330: X86: Don't emit zero-byte functions on Windows.

What does MSVC do for this? Is MSVC's behavior predicated on /guard:cf? Is it ok to generate empty functions without /guard:cf? Crash on calling an empty function seems like somewhat non-intuitive behavior.
Also, what about thumbv7-windows?

Apr 20 2017, 5:51 PM
mkuper added a reviewer for D32200: [LV] Refactor ILV.vectorize[Loop]() by introducing LVP.executePlan(): mssimpso.
Apr 20 2017, 11:57 AM

Apr 19 2017

mkuper accepted D32236: PR32710: Disable using PMADDWD for unsigned short..

LGTM

Apr 19 2017, 11:45 AM

Apr 17 2017

mkuper accepted D32054: [LV] Cache block mask values.

LGTM.

Apr 17 2017, 7:21 PM
mkuper accepted D32065: [SLP] Allow phi node reordering in tryToVectorizeList..

This looks ok, in the sense that it doesn't make things any worse, and will catch more cases - but I think it, unfortunately, doesn't really make things better either.
I don't think we can generalize this (because it blows up exponentially), so eventually we'll have to fix the underlying problems with the SLP "tree".

Apr 17 2017, 7:20 PM
mkuper accepted D32126: [CodeGenPrepare] Fix crash due to an invalid CFG.

LGTM, Thanks!

Apr 17 2017, 10:25 AM

Apr 13 2017

mkuper accepted D31753: [LoopPeeling] Fix condition for phi-eliminating peeling.

LGTM

Apr 13 2017, 6:01 PM
mkuper accepted D32040: [LV] Remove implicit single basic block assumption.

LGTM

Apr 13 2017, 1:00 PM

Apr 12 2017

mkuper accepted D31997: [LV] Refactor ILV to provide vectorizeInstruction() from vectorizeBlockInLoop().

LGTM with minor nit about comment.

Apr 12 2017, 4:35 PM
mkuper accepted D31953: Use methods to access data stored with frame instructions.

LGTM, except for the comments.

Apr 12 2017, 10:44 AM

Apr 10 2017

mkuper accepted D31906: [LV] Model if-converted phi node cost.

LGTM

Apr 10 2017, 3:10 PM
mkuper added a reviewer for D31833: [x86] Relax the check in areLoadsFromSameBasePtr: craig.topper.

This makes sense to me, but I don't understand why the Scale == 1 requirement was there to begin with.
+ctopper for another look.

Apr 10 2017, 3:01 PM

Apr 6 2017

mkuper accepted D31679: Use PMADDWD to expand reduction in a loop.

LGTM

Apr 6 2017, 10:00 PM
mkuper added a comment to D31679: Use PMADDWD to expand reduction in a loop.

I suggest we leave the PMADDUBSW discussion for a separate patch.

Apr 6 2017, 6:25 PM
mkuper committed rL299720: [X86] Revert r299387 due to AVX legalization infinite loop..
[X86] Revert r299387 due to AVX legalization infinite loop.
Apr 6 2017, 3:46 PM

Apr 3 2017

mkuper accepted D31577: Add 64 bit pattern matching for PSADBW.

LGTM

Apr 3 2017, 10:23 AM
mkuper added inline comments to D31577: Add 64 bit pattern matching for PSADBW.
Apr 3 2017, 12:39 AM
mkuper added a comment to D31577: Add 64 bit pattern matching for PSADBW.

Ohhh, I think I got it now. Sorry for the noise, let me think about it. :-)

Apr 3 2017, 12:27 AM
mkuper added a comment to D31577: Add 64 bit pattern matching for PSADBW.

**haven't really looked.

Apr 3 2017, 12:25 AM
mkuper added inline comments to D31577: Add 64 bit pattern matching for PSADBW.
Apr 3 2017, 12:22 AM

Apr 2 2017

mkuper added inline comments to D31577: Add 64 bit pattern matching for PSADBW.
Apr 2 2017, 10:43 AM

Mar 24 2017

mkuper accepted D31281: [LoopUnroll] Remap references in peeled iteration.

LGTM, but please fix the test before committing.

Mar 24 2017, 1:15 PM

Mar 23 2017

mkuper requested changes to D31281: [LoopUnroll] Remap references in peeled iteration.
Mar 23 2017, 12:51 PM

Mar 21 2017

mkuper added inline comments to D30710: [LV] Vectorize GEPs.
Mar 21 2017, 1:20 AM

Mar 19 2017

mkuper added a comment to D30710: [LV] Vectorize GEPs.

Sorry for the delay, Matt.

Mar 19 2017, 4:40 PM

Mar 16 2017

mkuper committed rL297993: [LoopUnroll] Don't peel loops where the latch isn't the exiting block.
[LoopUnroll] Don't peel loops where the latch isn't the exiting block
Mar 16 2017, 2:20 PM
mkuper closed D30757: [LoopUnroll] Don't peel loops where the latch isn't the exiting block by committing rL297993: [LoopUnroll] Don't peel loops where the latch isn't the exiting block.
Mar 16 2017, 2:20 PM
mkuper added a comment to D30757: [LoopUnroll] Don't peel loops where the latch isn't the exiting block.

Re the PS - I don't know.
As I wrote above - UnrollLoop() verifies the latch is conditional, but doesn't verify the two edges are actually going to the header and the exit block. But we haven't actually *seen* any issues with this, as far as I know.

Mar 16 2017, 1:58 PM
mkuper added a comment to D30710: [LV] Vectorize GEPs.

Sorry, lost track of this. I'll try to get to this today/tomorrow, unless Elena does first.

Mar 16 2017, 1:50 PM
mkuper updated the diff for D30757: [LoopUnroll] Don't peel loops where the latch isn't the exiting block.
Mar 16 2017, 1:42 PM

Mar 15 2017

mkuper added a comment to D30757: [LoopUnroll] Don't peel loops where the latch isn't the exiting block.

Do you know if we already have a utility somewhere that detects this?

Can we require the loop to be in a simplified form and its latch to be exiting? Will it suffice?

Mar 15 2017, 5:23 PM
mkuper added a comment to D30757: [LoopUnroll] Don't peel loops where the latch isn't the exiting block.

Looks good to me, but, as you noticed in the PR - do we want to work with such loops at all? Did you check what happens if we just bail out on such loops?

Michael

Mar 15 2017, 4:58 PM
mkuper added a comment to D30757: [LoopUnroll] Don't peel loops where the latch isn't the exiting block.

ping

Mar 15 2017, 3:58 PM

Mar 10 2017

mkuper committed rL297493: [SLP] Revert everything that has to do with memory access sorting..
[SLP] Revert everything that has to do with memory access sorting.
Mar 10 2017, 11:11 AM

Mar 8 2017

mkuper accepted D30715: [LV] A unified scalarizeInstruction() for Vectorizer and Unroller.

LGTM, Thanks!
(If you're not committing the addMetaData fix immediately after this, please put a FIXME above the "if (VF > 1)")

Mar 8 2017, 5:29 PM
mkuper created D30757: [LoopUnroll] Don't peel loops where the latch isn't the exiting block.
Mar 8 2017, 2:23 PM
mkuper added a comment to D30731: [SLP] Visualize SLP trees with -view-slp-tree.

The SLP part LGTM - it's less broken than the rest of the code, and we need to fix SLP itself to fix it (and avoid things like the Container hack...)

Mar 8 2017, 10:16 AM
mkuper added a comment to D30731: [SLP] Visualize SLP trees with -view-slp-tree.

Thank you, this will be incredibly helpful. I don't think it quite works, though - see inline.

Mar 8 2017, 12:04 AM

Mar 7 2017

mkuper accepted D30653: [LV] Refactor Cost Model's selectVectorizationFactor(), driven by a LoopVectorizationPlanner.

LGTM

Mar 7 2017, 2:29 PM
mkuper added a comment to D30638: [SLP] Fixed non-determenistic behavior in Loop Vectorizer..

LGTM.

Mar 7 2017, 1:48 PM
mkuper accepted D30638: [SLP] Fixed non-determenistic behavior in Loop Vectorizer..

LGTM, modulo the minor issues inline.

Mar 7 2017, 1:08 PM
mkuper added inline comments to D30638: [SLP] Fixed non-determenistic behavior in Loop Vectorizer..
Mar 7 2017, 12:43 PM
mkuper added a comment to D30686: [SLP] PR32078: convert scalar operations to vector..

I haven't looked at the patch in detail yet, but I don't understand the rationale for putting it in the SLP vectorizer.
If transforming this pattern is a win, it's a win regardless of whether it originated in the SLP vectorizer, or just happened to appear in the IR.

Mar 7 2017, 11:36 AM
mkuper added a comment to D30649: [SLP] Function for instruction cost calculation, NFC..

This makes sense. Some minor comments inline.

Mar 7 2017, 11:17 AM
mkuper added a comment to D30638: [SLP] Fixed non-determenistic behavior in Loop Vectorizer..

Notice that we will still need to iterate on the ordered set with a "while(!empty())" because we are removing entries from the Set while iterating on it.

Oy. That's not good. Removal from SetVector (except with pop_back) is linear-time... :-\

Mar 7 2017, 10:54 AM

Mar 6 2017

mkuper added a comment to D30632: [LoopUnrolling] Fix loop size check for peeling.

Do you need someone to check this in for you?

Mar 6 2017, 10:12 PM
mkuper added inline comments to D30416: [BitfieldShrinking] Shrink Bitfields load/store when the bitfields are legal to access independently.
Mar 6 2017, 8:02 PM
mkuper added inline comments to D30416: [BitfieldShrinking] Shrink Bitfields load/store when the bitfields are legal to access independently.
Mar 6 2017, 7:42 PM
mkuper added a comment to D30632: [LoopUnrolling] Fix loop size check for peeling.

LGTM.
Also, forgot it before, sorry - please add a test. (No need for a huge test, you can force a small threshold.)

Mar 6 2017, 7:29 PM
mkuper added inline comments to D30587: [LV] Delete unneeded scalar GEP creation code.
Mar 6 2017, 6:18 PM
mkuper added inline comments to D30653: [LV] Refactor Cost Model's selectVectorizationFactor(), driven by a LoopVectorizationPlanner.
Mar 6 2017, 4:18 PM
mkuper committed rL297103: [SLP] Revert r296863 due to miscompiles..
[SLP] Revert r296863 due to miscompiles.
Mar 6 2017, 4:07 PM