- User Since
- Oct 27 2012, 6:35 AM (451 w, 4 d)
Reverse ping, thanks.
The alternative solution landed some time ago.
@rnk done, all tests updated.
LG, thank you.
Something like this yes.
I think someone attempted this previously, and i don't recall why that didn't proceed.
It may be better to do this per-tool.
Let me take a look, i might have an idea how to un-recurse it..
Costmodel is wrong: if there are gaps, you need to load the vector of original values, and insert the non-gap elements into it.
Seems fine to me.
SGTM i guess.
Patch description doesn't explain *why* we should do that.
I'm personally is still having a hard time understanding the final picture. (same for loop-idiom pass patch)
Is LoopNest Invariant Code Motion Pass only about not doing the movement that breaks perfect loop nest?
What about all the code that is no longer moved?
What is the envisioned final pipeline structure?
Do we end up having to run both LNICM and LICM?
LG to me
Not really familiar with this code, but sounds reasonable.
I'm still not fine with this approach:
- 1000 is way too low here. E.g. for znver3 the ideal loop size after unroll is 4096, which means that even partial unroll will blow past this threshold.
- The threshold will depend on the underlying libc, since the default stack size differs between them.
- for testing purposes implies this never happens in real world, while the very existence of the patch implies that it does..
Looks good to me
LGTM, thank you.
Mon, Jun 21
Move code out of the loop since it looks cleaner that way.
Hopefully address review notes.
@rkn thank you for taking a look!
Do we need to check that GEP index size is big enough for this?
Should this not be integrated intp Instruction::hasMetadataOtherThanDebugLoc()/Instruction::dropUnknownNonDebugMetadata() ?
Why is this beneficial?
What about the case where the active bit width decreases, e.g. -129 is i9, incremented it is -128, which is i8.
The design seems weird here.
I think it would be best to first see the intended use case.
Looks good to me, thank you.
Sun, Jun 20
Autogenerate more, but still not all, affected codegen tests.
Split off addresstaken fixes.
Sat, Jun 19
Split off mostly NFC refactor into a separate change.
Thank you for adding the test.
I think you can freely commit this now.
Oh, so wait, we need this for correctness even?
That doesn't look good on the LAA's side.
Fri, Jun 18
I'm asking because i would like to see this change happen, and i don't believe it to be a bad change in general,
but the comment so far seemed to be rather dismissive, and it looks like this might end up following in footsteps
of rG13ec913, where a single platform penalizes/dictates behavior for all other platforms..
I think that function is misdesigned, but i can't suggest an easy good alternative.
This seems fine to me.
I think there may be some room for undef/poison incoming value handling.
Patch is missing description
Thank you for taking a look.
Thinking about it, this change should be split up into NFC refactor,
removal of "block must be empty" check,
and several patches to enable each terminator opcode.
Thu, Jun 17
Would it please be possible to either enhance comments, or port more comments from the SelectionDAG::computeKnownBits()?
It appears to match that original implementation, and seems correct, but it is hard to read/get through.
Wed, Jun 16
I think this would benefit from adding the testcase[s] for the pass that uses this utility that showcase the improvements here.
This is only going to work if getMinusSCEV() is called directly though.
Glad that we have established this.
Roughly, we can either end up with SCEV effectively being inoperable for non-integral pointers,
or the users of non-integral pointers having issues with lowering the ptrtoint's produced by SCEV.
I don't know what is worse, and i'm presently not sure i care, but i do believe that we can not
leave the current SCEV casual-ness of treating pointers as integers as-is.
I actually had some of the patches @efriedma posted locally, and i agree with them.
I don't believe this is correct.
This is either not a bug, r should be dealt with by adding grouping, and hiding non-tool cl::opts.
No i didn't.
I think while this may be somewhat correct,
is not really correct. For example, before AVX,
there is no sub-32-bit shuffles, only unpacks.
@RKSimon there are two changes here:
- InVT2Size * 2 == VTSize && InVT1Size == VTSize -> (VTSize % InVT2Size == 0) && InVT1Size == VTSize
- merging two blocks together.
Tue, Jun 15
The pass has now been reverted by rGe52364532afb2748c324f360bc1cc12605d314f3.
FYI this has now been reverted by rGe52364532afb2748c324f360bc1cc12605d314f3.
(This is not offload-specific, right?)
This does not bring any compatibility issues, right?
Does this bring any compatibility issues?