- User Since
- Nov 23 2015, 1:58 PM (151 w, 5 d)
Wed, Oct 17
Rebase to head. Some changes to alias make this require a much longer chain of stores. The need is clearly less than when I wrote this patch a year ago, but still has a minor positive effect.
Tue, Oct 16
Fix topological ordering construction to prevent premature pruning.
Mon, Oct 15
Oh good catch. This is O(N^3) for long chains because we try every prefix of the chain, and because the chain is so long our chain improvements gives up and noops, so we just redo the same computation over and over. Cutting the length to any reasonable size makes it O(N^2). I suspect your observed merge size requirement is to deal with our incomplete handling of chain SubDAG in store merge (just a children of a single TF potentially skipping some loads), because it shouldn't matter otherwise (See (1)).
Thu, Oct 11
Good point. Scenario 1 may happen if we have nodes A, B, C, GlueOp, and GlueUser such that there's uses (A-> GlueUse, GlueOp->GlueUser, GlueOp->B, B-> C) and the node id ordering is GOp < B < A < GU < C then checking if A is a predecessor to C would stop searching at B and fail to see the remaining. I am unsure if this is possible currently given our id scheme, but I'll take a look. I believe it will always bias towards selecting A before B.
Wed, Oct 10
Tue, Oct 9
Rebase to tip.
Rebase to tip and address rksimon's renaming suggestion.
Wed, Oct 3
Tue, Oct 2
Simplify tests. Add comment.
Mon, Oct 1
Fri, Sep 28
Update to better match GCC's behavior. Not as aggressive as we can be.
Wed, Sep 26
Tue, Sep 25
Mon, Sep 24
It looks like you may not have commit access. Would you like me to commit this for you?
Sep 20 2018
Sadly, no clever ideas on my part. If it's not repeatable on any of the in-tree backend let's just commit this now; it's a fixes a clear oversight.
Do you have a test case?
Sep 19 2018
I've gone through and marked all the places.
Sep 18 2018
The codesize issues are minor and shouldn't hold this patch up. The only blocker I see is the unnecessary data shuffling for SSE41 codegen which someone else should decide on.
Looks google fine modulo the noted issues with load.
Sep 17 2018
Yes, this is a fix to match GCC's register assignment for a 64-bit in
32-bit mode to pairs of registers.
Sep 14 2018
Sep 13 2018
Huh. It looks like I commited a partial patch change. The uploaded patch has only some of my change to remove the remove breaks from the switch. The EAX, EDX, and ECX cases should have also be returns (or at least have breaks). Corrected patch here.
We should make sure to do the multiplication as (unsigned long long) to give us 64-bits because vectors on the order of 2^16 are potentially reasonable and would cause an overflow at 32-bits.
Sep 12 2018
Sep 10 2018
Match GCC's register assignment behavior. This causes some minor test case reordering from the introduced register classes. Interestingly unfold-masked-merge-vector-variablemask.ll has slightly fewer spills.
Aug 30 2018
Aug 28 2018
Aug 23 2018
Aug 22 2018
I think you should just return immediately and let it get replaced the next time.
Aug 21 2018
I’m pretty sure the case where UpdateNodeOperands returns an existing node instead of updating N doesn’t work. But I haven’t been able to trigger it with a test.
Aug 16 2018
Aug 15 2018
Aug 14 2018
Jul 23 2018
Jul 20 2018
Jul 18 2018
Remove the expensive check in favor of more conservative check that LD's chain has one use.
Using a temporary node avoids the intermediate loop issue, though a better temp node structure would be preferable.
But you're correct that any simplification to the AND obviates the need for fixup so let's us that.