Page MenuHomePhabricator

mivnay (Vinay Madhusudan)
User

Projects

User does not belong to any projects.

User Details

User Since
Oct 21 2016, 3:46 AM (208 w, 3 d)

Recent Activity

Thu, Oct 15

mivnay added a comment to D88819: [LV] Support for Remainder loop vectorization.

I will try to summarize the current changes done in the below C code and also try to answer some of the common questions raised.

Thu, Oct 15, 8:07 AM · Restricted Project
mivnay added a comment to D88819: [LV] Support for Remainder loop vectorization.

ping!

Thu, Oct 15, 2:45 AM · Restricted Project

Wed, Oct 14

mivnay committed rG159b2d8e62b8: [AArch64] Combine UADDVs to generate vector add (authored by mivnay).
[AArch64] Combine UADDVs to generate vector add
Wed, Oct 14, 9:31 PM
mivnay closed D88731: [AArch64] Combine UADDVs to generate vector add.
Wed, Oct 14, 9:30 PM · Restricted Project
mivnay updated the diff for D88731: [AArch64] Combine UADDVs to generate vector add.
Wed, Oct 14, 4:43 AM · Restricted Project
mivnay added a comment to D88731: [AArch64] Combine UADDVs to generate vector add.

The pre-commit check is not being helpful here... :-/

Seems like a useful optimisation to me though.

Can you add test for i16 and i8. As far as I understand they will not fold because we will have legalized types and the return type will not match the vector element type?

Wed, Oct 14, 4:10 AM · Restricted Project
mivnay committed rGa6ad5930d5f5: [AArch64] Add more addv tests (authored by mivnay).
[AArch64] Add more addv tests
Wed, Oct 14, 3:00 AM
mivnay closed D89365: [AArch64] Add more addv tests.
Wed, Oct 14, 3:00 AM · Restricted Project

Tue, Oct 13

mivnay added a comment to D88731: [AArch64] Combine UADDVs to generate vector add.

Can you also run the update_llc_test_checks.py script on the file and pre-commit the tests, just showing the changes here.

Sure. I have created new patch : https://reviews.llvm.org/D89365

Tue, Oct 13, 10:48 PM · Restricted Project
mivnay requested review of D89365: [AArch64] Add more addv tests.
Tue, Oct 13, 10:44 PM · Restricted Project
mivnay committed rG37dce7475b2b: [AArch64] Identify SAD pattern (authored by mivnay).
[AArch64] Identify SAD pattern
Tue, Oct 13, 3:24 AM
mivnay closed D88742: [AArch64] Identify SAD pattern.
Tue, Oct 13, 3:24 AM · Restricted Project
mivnay added a comment to D88731: [AArch64] Combine UADDVs to generate vector add.

ping!

Tue, Oct 13, 1:15 AM · Restricted Project

Sun, Oct 11

mivnay updated the diff for D88742: [AArch64] Identify SAD pattern.

Adding support for SABD patterns..

Sun, Oct 11, 9:34 PM · Restricted Project

Sat, Oct 10

mivnay updated the diff for D88731: [AArch64] Combine UADDVs to generate vector add.

Add support for all integer types.

Sat, Oct 10, 3:37 AM · Restricted Project

Fri, Oct 9

mivnay retitled D88742: [AArch64] Identify SAD pattern from [AArch64] Identify SAD pattern for v16i8 type to [AArch64] Identify SAD pattern.
Fri, Oct 9, 3:35 AM · Restricted Project
mivnay updated the diff for D88742: [AArch64] Identify SAD pattern.
Fri, Oct 9, 3:35 AM · Restricted Project

Thu, Oct 8

mivnay added inline comments to D88742: [AArch64] Identify SAD pattern.
Thu, Oct 8, 7:12 AM · Restricted Project
mivnay added a comment to D88742: [AArch64] Identify SAD pattern.

Does this still only handle v16i8?

Thu, Oct 8, 6:59 AM · Restricted Project

Wed, Oct 7

mivnay added inline comments to D88819: [LV] Support for Remainder loop vectorization.
Wed, Oct 7, 7:41 AM · Restricted Project
mivnay updated the diff for D88819: [LV] Support for Remainder loop vectorization.

Fixed review comments

Wed, Oct 7, 7:41 AM · Restricted Project
mivnay added a comment to D88742: [AArch64] Identify SAD pattern.

What part of sadb are you worried about? I thought they could be treated the same, given you are extended from enough extra bits. But I may be mistaken, they can be somewhat difficult.

I am worried about the semantics of sabd instruction especially about v16i32 (sub(sext(v16i8), sext(v16i8))) to v16i32 sext(sub(v16i8, v16i8)) part. Can this be looked at later once my other patches are through?

The return type of the abd is positive unsigned, so I think if turning it into v16i32 zext(sabd(v16i8, v16i8)) should be valid? But it's worth testing that, I may be mistaken..

Wed, Oct 7, 3:27 AM · Restricted Project
mivnay abandoned D88897: [AArch64] WIP for review D88742.

Moved back to https://reviews.llvm.org/D88742

Wed, Oct 7, 3:23 AM · Restricted Project
mivnay added a comment to D88897: [AArch64] WIP for review D88742.

Right. Because of Global ISel? I don't know that very well and if there's a similar place to do the same thing, converting an intrinsics to the kind of node that would be matched.

My hope was that we would be able to make something that I could later lift to being target-independent and share across AArch64 and MVE.

Replicating the patterns doesn't feel great though, and disabling the test feels a bit of a step backwards. Perhaps make the pattern match either the node or the intrinsic:

def AArch64uabd_n   : SDNode<"AArch64ISD::UABD", SDT_AArch64binvec>;
def AArch64sabd_n   : SDNode<"AArch64ISD::SABD", SDT_AArch64binvec>;
def AArch64uabd     : PatFrags<(ops node:$lhs, node:$rhs),
                                   [(AArch64uabd_n node:$lhs, node:$rhs),
                                    (int_aarch64_neon_uabd node:$lhs, node:$rhs)]>;
def AArch64sabd     : PatFrags<(ops node:$lhs, node:$rhs),
                                   [(AArch64sabd_n node:$lhs, node:$rhs),
                                    (int_aarch64_neon_sabd node:$lhs, node:$rhs)]>;
Wed, Oct 7, 3:21 AM · Restricted Project
mivnay updated the diff for D88742: [AArch64] Identify SAD pattern.
Wed, Oct 7, 3:21 AM · Restricted Project

Tue, Oct 6

mivnay edited reviewers for D88819: [LV] Support for Remainder loop vectorization, added: ashutosh.nema; removed: Ashutosh.
Tue, Oct 6, 11:12 AM · Restricted Project
mivnay added a reviewer for D88819: [LV] Support for Remainder loop vectorization: Ashutosh.
Tue, Oct 6, 11:11 AM · Restricted Project
mivnay added a comment to D88819: [LV] Support for Remainder loop vectorization.

Did you consider supporting this naturally by just having LV re-visit the newly created remainder loops, i.e. remember the created remainder loops and add them to the top-level worklist https://github.com/llvm/llvm-project/blob/master/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp#L8587 ? We would need to make sure we do not visit them repeatedly, but overall we should be able to achieve the same goal, but without adding extra complexity to the vectorizer.

Blindly calling the vectorizer for the loop again is not optimal. The current change does that but at lower abstraction level. The majority of the changes are about setting the right overall CFG structure. Example: It is unnecessary to execute runtime checks twice, etc. "struct EpilogVectorLoopHelper" is just the carrier of information from the original vector loop generation to the epilog vector loop generation. Also, InnerLoopVectorizer doesn't expose the vector loop CFG structure to it's users. Fixing the CFG structure at the higher abstraction level exposes this class completely.

Tue, Oct 6, 11:10 AM · Restricted Project
mivnay added a comment to D88819: [LV] Support for Remainder loop vectorization.

Thanks for working on epilogue vectorization. Incidentally I've also looked into this recently. There has been a long and detailed discussion on the mailing list from back in 2017 about this transformation here http://llvm.1065342.n5.nabble.com/llvm-dev-Proposal-RFC-Epilog-loop-vectorization-td106322.html. Your patch is able to vectorize epilogue loops with fairly small changes to the LV, however the generated CFG is not optimal. For example, while the SCEV and memory checks are not redundantly executed, they are statically duplicated in the code and increase code size unnecessarily. The trip count checks can also be generated in a way that shortens the critical path from the checks to the scalar loop, which is critical for loops that have a small trip count. Based on the followup discussions from the mentioned RFC, the optimal CFG should look more like what I've attached below.

Tue, Oct 6, 10:49 AM · Restricted Project
mivnay added a comment to D88742: [AArch64] Identify SAD pattern.

What part of sadb are you worried about? I thought they could be treated the same, given you are extended from enough extra bits. But I may be mistaken, they can be somewhat difficult.

Tue, Oct 6, 9:49 AM · Restricted Project
mivnay updated the diff for D88897: [AArch64] WIP for review D88742.
Tue, Oct 6, 9:28 AM · Restricted Project
mivnay added a comment to D88897: [AArch64] WIP for review D88742.

There's some code in performExtendCombine from the look of it, still looking at the intrinsics ID's.

Tue, Oct 6, 9:16 AM · Restricted Project
mivnay updated the diff for D88897: [AArch64] WIP for review D88742.
Tue, Oct 6, 9:09 AM · Restricted Project
mivnay added a comment to D88897: [AArch64] WIP for review D88742.

This change seems to be generating a different pattern (UABDLv4i32_v2i64 vs UABDLv2i32_v2i64) for one of the test cases in arm64-vabs.ll. I am not sure whether the new pattern generated is correct and better.

Tue, Oct 6, 7:02 AM · Restricted Project
mivnay added a comment to D88742: [AArch64] Identify SAD pattern.

So this converts v16i32 abs(sub(zext(v16i8), zext(v16i8)))) to v16i32 zext(abd(v16i8, v16i8))? And doesn't start with a trunk? That sounds like it should work OK.

Yes.

Tue, Oct 6, 6:55 AM · Restricted Project
mivnay requested review of D88897: [AArch64] WIP for review D88742.
Tue, Oct 6, 6:39 AM · Restricted Project

Mon, Oct 5

mivnay added a comment to D88819: [LV] Support for Remainder loop vectorization.

Did you consider supporting this naturally by just having LV re-visit the newly created remainder loops, i.e. remember the created remainder loops and add them to the top-level worklist https://github.com/llvm/llvm-project/blob/master/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp#L8587 ? We would need to make sure we do not visit them repeatedly, but overall we should be able to achieve the same goal, but without adding extra complexity to the vectorizer.

Mon, Oct 5, 8:12 AM · Restricted Project
mivnay added a comment to D88819: [LV] Support for Remainder loop vectorization.

The CFG after the optimization of a typical loop will be as follows:

Mon, Oct 5, 4:04 AM · Restricted Project
mivnay requested review of D88819: [LV] Support for Remainder loop vectorization.
Mon, Oct 5, 4:00 AM · Restricted Project
mivnay added a reviewer for D88742: [AArch64] Identify SAD pattern: fhahn.
Mon, Oct 5, 12:28 AM · Restricted Project

Fri, Oct 2

mivnay added reviewers for D88742: [AArch64] Identify SAD pattern: SjoerdMeijer, dmgreen.
Fri, Oct 2, 9:26 AM · Restricted Project
mivnay requested review of D88742: [AArch64] Identify SAD pattern.
Fri, Oct 2, 9:04 AM · Restricted Project
mivnay added a comment to D88577: [AArch64] Generate dot for v16i8 sum reduction to i32.

I can certainly commit this. I just need an author string to attribute it correctly. Is "Vinay Madhusudan <vinay@compilertree.com>" OK for that?

Fri, Oct 2, 8:27 AM · Restricted Project
mivnay updated the diff for D88731: [AArch64] Combine UADDVs to generate vector add.
Fri, Oct 2, 8:19 AM · Restricted Project
mivnay added a comment to D88577: [AArch64] Generate dot for v16i8 sum reduction to i32.

The build failure is unrelated to this patch and happening in other patches as well (Example: https://reviews.llvm.org/D88471). What should I do? Also, I do not have commit access. Can you please help in committing this patch?

Fri, Oct 2, 5:38 AM · Restricted Project
mivnay requested review of D88731: [AArch64] Combine UADDVs to generate vector add.
Fri, Oct 2, 5:31 AM · Restricted Project
mivnay updated the diff for D88577: [AArch64] Generate dot for v16i8 sum reduction to i32.
Fri, Oct 2, 3:47 AM · Restricted Project

Thu, Oct 1

mivnay updated the diff for D88577: [AArch64] Generate dot for v16i8 sum reduction to i32.

Fixed context and review comments

Thu, Oct 1, 11:13 PM · Restricted Project
mivnay retitled D88577: [AArch64] Generate dot for v16i8 sum reduction to i32 from [AArch64] Generate udot for v16i8 sum reduction to i32 to [AArch64] Generate dot for v16i8 sum reduction to i32.
Thu, Oct 1, 4:42 AM · Restricted Project
mivnay added inline comments to D88577: [AArch64] Generate dot for v16i8 sum reduction to i32.
Thu, Oct 1, 4:41 AM · Restricted Project
mivnay updated the diff for D88577: [AArch64] Generate dot for v16i8 sum reduction to i32.

Added support for SDOT

Thu, Oct 1, 4:36 AM · Restricted Project
mivnay added a comment to D88577: [AArch64] Generate dot for v16i8 sum reduction to i32.

Hello

Can you update with full context? -U999999. It makes phabriactor reviews easier to follow.

I had thought about this somewhat in reference to inloop reductions. I had presumed that it would need some form of partial reduction though, as you would want part of the reduction would then happen outside the loop (I think)

Improving codegen on it's own is good, but I'm interested in seeing how this fits with the other patches.

Thu, Oct 1, 3:29 AM · Restricted Project
mivnay updated the summary of D88577: [AArch64] Generate dot for v16i8 sum reduction to i32.
Thu, Oct 1, 3:18 AM · Restricted Project
mivnay added a comment to D88577: [AArch64] Generate dot for v16i8 sum reduction to i32.

I haven't looked in much detail at this patch, but this looks like some straightforward lowering of llvm.experimental.vector.reduce.add. Absolutely nothing wrong with that, but I am curious who's going to produce this intrinsic? The vectoriser, the matrix pass? In other words, any ideas on the bigger picture?

Thu, Oct 1, 2:51 AM · Restricted Project
mivnay added reviewers for D88577: [AArch64] Generate dot for v16i8 sum reduction to i32: SjoerdMeijer, dmgreen, fhahn.
Thu, Oct 1, 2:21 AM · Restricted Project

Wed, Sep 30

mivnay updated the diff for D88577: [AArch64] Generate dot for v16i8 sum reduction to i32.
Wed, Sep 30, 8:42 AM · Restricted Project
mivnay requested review of D88577: [AArch64] Generate dot for v16i8 sum reduction to i32.
Wed, Sep 30, 8:24 AM · Restricted Project

Oct 21 2016

mivnay retitled D25868: Loop Utils: Cloning any level loop nest from to Loop Utils: Cloning any level loop nest.
Oct 21 2016, 4:18 AM