Page MenuHomePhabricator

mivnay (Vinay Madhusudan)
User

Projects

User does not belong to any projects.

User Details

User Since
Oct 21 2016, 3:46 AM (233 w, 1 d)

Recent Activity

Sat, Mar 13

mivnay added a comment to D93838: [LLVM] [SCCP] : Add Function Specialization pass.

I have also become interested in this work as this regularly shows up as difference between clang and gcc. For the first attempt to add a function specialisation pass in D36432 very rightfully questions were asked about compile-times and code-size, and these questions were repeated for this work. The table https://reviews.llvm.org/D36432#836883 shows good numbers, and I think in order to progress this work we need something similar. I.e. the approach here is a bit different and things may have changed. But this is also mentioned in the description:

The pass is still missing the profitability.

and

I am yet to do the complete SPEC run and measure the speed /size change.

I agree with @fhahn and others that this will be most useful if it gets enabled by default.

Inline cost model works for the function pointer like arguments. We are specializing other type arguments as well. I have a patch ready for proposal but waiting for D93762 to be checked-in first..

My impression is that this is not the right order of events. To address concerns about regressions, we first need to get some data on the table first I guess. Then, ideas can be developed to get it enabled by default. If that path looks credible, I am sure we get D93762 approved/committed in no time. D93762 is thus not going to be blocker here.

And what should support this work, is that GCC has this enabled from -O3 and up, thus showing that this should be possible.

Sat, Mar 13, 2:18 AM · Restricted Project
mivnay updated the diff for D93838: [LLVM] [SCCP] : Add Function Specialization pass.

I have updated the full patch along with the cost model. I have also shared the perf numbers for SPEC CPU 2017 benchmarks in Graviton2 with various modes of optimization enabled :

Sat, Mar 13, 2:15 AM · Restricted Project

Mar 5 2021

mivnay added inline comments to D93838: [LLVM] [SCCP] : Add Function Specialization pass.
Mar 5 2021, 10:04 PM · Restricted Project
mivnay added a comment to D93838: [LLVM] [SCCP] : Add Function Specialization pass.

@mivnay

Could you please rebase the code? The latest SCCP code have a few changes.

Mar 5 2021, 10:02 PM · Restricted Project

Feb 20 2021

mivnay added a comment to D93838: [LLVM] [SCCP] : Add Function Specialization pass.

hi @mivnay, could you please rebase this and D93762 ? So we can test it.
It say that this patch can improve mcf 8%, we are interested at this because we also found that mcf has performance regression caused by ILP data dependency.

Feb 20 2021, 4:07 AM · Restricted Project
mivnay added a comment to D93762: SCCP: Refactor SCCPSolver.

This looks good to me. @fhahn are you happy with the suggested approach to tighten up the SCCPSolver class in subsequent patches, when function inlining starts using it?

IMO we should nail down the interface first, rather than starting with exposing/moving everything, to avoid a lot of unnecessary churn.

Feb 20 2021, 3:27 AM · Restricted Project

Feb 4 2021

mivnay added inline comments to D93762: SCCP: Refactor SCCPSolver.
Feb 4 2021, 9:17 PM · Restricted Project
mivnay updated the diff for D93762: SCCP: Refactor SCCPSolver.
Feb 4 2021, 9:08 PM · Restricted Project

Feb 2 2021

mivnay updated the diff for D93762: SCCP: Refactor SCCPSolver.

Move SCCPSolver to new header and .cpp file.

Feb 2 2021, 4:09 AM · Restricted Project

Jan 27 2021

mivnay added a comment to D93838: [LLVM] [SCCP] : Add Function Specialization pass.

Hi @mivnay,

As the approach here is pretty much identical to D36432, do you have any plans on the cost model for this pass? The earlier proposal seems to have been abandoned due to a lack of good heuristics to trade off between the (potentially huge) increase in code size (as well as compile time) and performance improvements (which may or may not arise due to later optimizations). Perhaps it would help to look at what trade-offs GCC makes? Specifically, it would be good to have an idea of how you would implement getSpecializationCost and getSpecializationBonus.

To help with review, could you make sure this patch is based on top of D93762 so that those changes don't appear here again?

Thanks!

Jan 27 2021, 10:19 PM · Restricted Project

Jan 7 2021

mivnay added a comment to D93762: SCCP: Refactor SCCPSolver.

ping!

Jan 7 2021, 9:03 AM · Restricted Project
mivnay added a comment to D93838: [LLVM] [SCCP] : Add Function Specialization pass.

ping!

Jan 7 2021, 9:03 AM · Restricted Project

Dec 27 2020

mivnay added a comment to D93762: SCCP: Refactor SCCPSolver.

@mivnay It would be helpful to already put up the patch for your intended use case (as a dependent revision), so it's possible to evaluate this in context.

Dec 27 2020, 7:29 AM · Restricted Project
mivnay requested review of D93838: [LLVM] [SCCP] : Add Function Specialization pass.
Dec 27 2020, 7:25 AM · Restricted Project

Dec 23 2020

mivnay added a comment to D93762: SCCP: Refactor SCCPSolver.

I think it makes sense to expose this more widely in general. But could you share the places where you intend to make use of it?

I think we should restrict the public interface exposed to what really is required for external clients, and keep other details in the .cpp file (e.g. most of the members related to function/argument tracking and the worklists could stay in the .cpp

Dec 23 2020, 8:18 AM · Restricted Project
mivnay requested review of D93762: SCCP: Refactor SCCPSolver.
Dec 23 2020, 7:05 AM · Restricted Project

Dec 8 2020

mivnay added a comment to D89566: [LV] Epilogue Vectorization with Optimal Control Flow.

Should this really be enabled by default given the current (lack of) cost modelling? Primarily concerned about the code size regression this causes.

Please note that functions with optsize and minsize attributes are not affected. I'm ok with turning it back off (that was my preference as well), but curious about what has regressed (since that information can help with cost-modeling). Any comments @mivnay @xbolva00 ?

Dec 8 2020, 2:12 AM · Restricted Project

Nov 9 2020

mivnay added a comment to D89566: [LV] Epilogue Vectorization with Optimal Control Flow.

I also think "blocking" is not the right terminology here. But I was asking about this because first we had D88819, then came this one, so we have 2 options. In the end it's pretty simple I guess, because we go for the one with the best code-gen, and this patch looks like the one with the most potential. But would be nice to get that confirmed.

On Thunderx2 , x264, 1c, rate :
Base : 3.31
With this patch : 3.50
With D88819 : 3.60

Nov 9 2020, 2:27 AM · Restricted Project

Oct 29 2020

mivnay added inline comments to D89566: [LV] Epilogue Vectorization with Optimal Control Flow.
Oct 29 2020, 9:01 AM · Restricted Project

Oct 27 2020

mivnay committed rG3fa20baf009d: [Flang][OpenMP 4.5] Add semantic check for OpenMP default clause (authored by yhegde).
[Flang][OpenMP 4.5] Add semantic check for OpenMP default clause
Oct 27 2020, 9:39 AM

Oct 26 2020

mivnay added inline comments to D89566: [LV] Epilogue Vectorization with Optimal Control Flow.
Oct 26 2020, 9:00 AM · Restricted Project

Oct 15 2020

mivnay added a comment to D88819: [LV] Support for Remainder loop vectorization.

I will try to summarize the current changes done in the below C code and also try to answer some of the common questions raised.

Oct 15 2020, 8:07 AM · Restricted Project
mivnay added a comment to D88819: [LV] Support for Remainder loop vectorization.

ping!

Oct 15 2020, 2:45 AM · Restricted Project

Oct 14 2020

mivnay committed rG159b2d8e62b8: [AArch64] Combine UADDVs to generate vector add (authored by mivnay).
[AArch64] Combine UADDVs to generate vector add
Oct 14 2020, 9:31 PM
mivnay closed D88731: [AArch64] Combine UADDVs to generate vector add.
Oct 14 2020, 9:30 PM · Restricted Project
mivnay updated the diff for D88731: [AArch64] Combine UADDVs to generate vector add.
Oct 14 2020, 4:43 AM · Restricted Project
mivnay added a comment to D88731: [AArch64] Combine UADDVs to generate vector add.

The pre-commit check is not being helpful here... :-/

Seems like a useful optimisation to me though.

Can you add test for i16 and i8. As far as I understand they will not fold because we will have legalized types and the return type will not match the vector element type?

Oct 14 2020, 4:10 AM · Restricted Project
mivnay committed rGa6ad5930d5f5: [AArch64] Add more addv tests (authored by mivnay).
[AArch64] Add more addv tests
Oct 14 2020, 3:00 AM
mivnay closed D89365: [AArch64] Add more addv tests.
Oct 14 2020, 3:00 AM · Restricted Project

Oct 13 2020

mivnay added a comment to D88731: [AArch64] Combine UADDVs to generate vector add.

Can you also run the update_llc_test_checks.py script on the file and pre-commit the tests, just showing the changes here.

Sure. I have created new patch : https://reviews.llvm.org/D89365

Oct 13 2020, 10:48 PM · Restricted Project
mivnay requested review of D89365: [AArch64] Add more addv tests.
Oct 13 2020, 10:44 PM · Restricted Project
mivnay committed rG37dce7475b2b: [AArch64] Identify SAD pattern (authored by mivnay).
[AArch64] Identify SAD pattern
Oct 13 2020, 3:24 AM
mivnay closed D88742: [AArch64] Identify SAD pattern.
Oct 13 2020, 3:24 AM · Restricted Project
mivnay added a comment to D88731: [AArch64] Combine UADDVs to generate vector add.

ping!

Oct 13 2020, 1:15 AM · Restricted Project

Oct 11 2020

mivnay updated the diff for D88742: [AArch64] Identify SAD pattern.

Adding support for SABD patterns..

Oct 11 2020, 9:34 PM · Restricted Project

Oct 10 2020

mivnay updated the diff for D88731: [AArch64] Combine UADDVs to generate vector add.

Add support for all integer types.

Oct 10 2020, 3:37 AM · Restricted Project

Oct 9 2020

mivnay retitled D88742: [AArch64] Identify SAD pattern from [AArch64] Identify SAD pattern for v16i8 type to [AArch64] Identify SAD pattern.
Oct 9 2020, 3:35 AM · Restricted Project
mivnay updated the diff for D88742: [AArch64] Identify SAD pattern.
Oct 9 2020, 3:35 AM · Restricted Project

Oct 8 2020

mivnay added inline comments to D88742: [AArch64] Identify SAD pattern.
Oct 8 2020, 7:12 AM · Restricted Project
mivnay added a comment to D88742: [AArch64] Identify SAD pattern.

Does this still only handle v16i8?

Oct 8 2020, 6:59 AM · Restricted Project

Oct 7 2020

mivnay added inline comments to D88819: [LV] Support for Remainder loop vectorization.
Oct 7 2020, 7:41 AM · Restricted Project
mivnay updated the diff for D88819: [LV] Support for Remainder loop vectorization.

Fixed review comments

Oct 7 2020, 7:41 AM · Restricted Project
mivnay added a comment to D88742: [AArch64] Identify SAD pattern.

What part of sadb are you worried about? I thought they could be treated the same, given you are extended from enough extra bits. But I may be mistaken, they can be somewhat difficult.

I am worried about the semantics of sabd instruction especially about v16i32 (sub(sext(v16i8), sext(v16i8))) to v16i32 sext(sub(v16i8, v16i8)) part. Can this be looked at later once my other patches are through?

The return type of the abd is positive unsigned, so I think if turning it into v16i32 zext(sabd(v16i8, v16i8)) should be valid? But it's worth testing that, I may be mistaken..

Oct 7 2020, 3:27 AM · Restricted Project
mivnay abandoned D88897: [AArch64] WIP for review D88742.

Moved back to https://reviews.llvm.org/D88742

Oct 7 2020, 3:23 AM · Restricted Project
mivnay added a comment to D88897: [AArch64] WIP for review D88742.

Right. Because of Global ISel? I don't know that very well and if there's a similar place to do the same thing, converting an intrinsics to the kind of node that would be matched.

My hope was that we would be able to make something that I could later lift to being target-independent and share across AArch64 and MVE.

Replicating the patterns doesn't feel great though, and disabling the test feels a bit of a step backwards. Perhaps make the pattern match either the node or the intrinsic:

def AArch64uabd_n   : SDNode<"AArch64ISD::UABD", SDT_AArch64binvec>;
def AArch64sabd_n   : SDNode<"AArch64ISD::SABD", SDT_AArch64binvec>;
def AArch64uabd     : PatFrags<(ops node:$lhs, node:$rhs),
                                   [(AArch64uabd_n node:$lhs, node:$rhs),
                                    (int_aarch64_neon_uabd node:$lhs, node:$rhs)]>;
def AArch64sabd     : PatFrags<(ops node:$lhs, node:$rhs),
                                   [(AArch64sabd_n node:$lhs, node:$rhs),
                                    (int_aarch64_neon_sabd node:$lhs, node:$rhs)]>;
Oct 7 2020, 3:21 AM · Restricted Project
mivnay updated the diff for D88742: [AArch64] Identify SAD pattern.
Oct 7 2020, 3:21 AM · Restricted Project

Oct 6 2020

mivnay edited reviewers for D88819: [LV] Support for Remainder loop vectorization, added: ashutosh.nema; removed: Ashutosh.
Oct 6 2020, 11:12 AM · Restricted Project
mivnay added a reviewer for D88819: [LV] Support for Remainder loop vectorization: Ashutosh.
Oct 6 2020, 11:11 AM · Restricted Project
mivnay added a comment to D88819: [LV] Support for Remainder loop vectorization.

Did you consider supporting this naturally by just having LV re-visit the newly created remainder loops, i.e. remember the created remainder loops and add them to the top-level worklist https://github.com/llvm/llvm-project/blob/master/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp#L8587 ? We would need to make sure we do not visit them repeatedly, but overall we should be able to achieve the same goal, but without adding extra complexity to the vectorizer.

Blindly calling the vectorizer for the loop again is not optimal. The current change does that but at lower abstraction level. The majority of the changes are about setting the right overall CFG structure. Example: It is unnecessary to execute runtime checks twice, etc. "struct EpilogVectorLoopHelper" is just the carrier of information from the original vector loop generation to the epilog vector loop generation. Also, InnerLoopVectorizer doesn't expose the vector loop CFG structure to it's users. Fixing the CFG structure at the higher abstraction level exposes this class completely.

Oct 6 2020, 11:10 AM · Restricted Project
mivnay added a comment to D88819: [LV] Support for Remainder loop vectorization.

Thanks for working on epilogue vectorization. Incidentally I've also looked into this recently. There has been a long and detailed discussion on the mailing list from back in 2017 about this transformation here http://llvm.1065342.n5.nabble.com/llvm-dev-Proposal-RFC-Epilog-loop-vectorization-td106322.html. Your patch is able to vectorize epilogue loops with fairly small changes to the LV, however the generated CFG is not optimal. For example, while the SCEV and memory checks are not redundantly executed, they are statically duplicated in the code and increase code size unnecessarily. The trip count checks can also be generated in a way that shortens the critical path from the checks to the scalar loop, which is critical for loops that have a small trip count. Based on the followup discussions from the mentioned RFC, the optimal CFG should look more like what I've attached below.

Oct 6 2020, 10:49 AM · Restricted Project
mivnay added a comment to D88742: [AArch64] Identify SAD pattern.

What part of sadb are you worried about? I thought they could be treated the same, given you are extended from enough extra bits. But I may be mistaken, they can be somewhat difficult.

Oct 6 2020, 9:49 AM · Restricted Project
mivnay updated the diff for D88897: [AArch64] WIP for review D88742.
Oct 6 2020, 9:28 AM · Restricted Project
mivnay added a comment to D88897: [AArch64] WIP for review D88742.

There's some code in performExtendCombine from the look of it, still looking at the intrinsics ID's.

Oct 6 2020, 9:16 AM · Restricted Project
mivnay updated the diff for D88897: [AArch64] WIP for review D88742.
Oct 6 2020, 9:09 AM · Restricted Project
mivnay added a comment to D88897: [AArch64] WIP for review D88742.

This change seems to be generating a different pattern (UABDLv4i32_v2i64 vs UABDLv2i32_v2i64) for one of the test cases in arm64-vabs.ll. I am not sure whether the new pattern generated is correct and better.

Oct 6 2020, 7:02 AM · Restricted Project
mivnay added a comment to D88742: [AArch64] Identify SAD pattern.

So this converts v16i32 abs(sub(zext(v16i8), zext(v16i8)))) to v16i32 zext(abd(v16i8, v16i8))? And doesn't start with a trunk? That sounds like it should work OK.

Yes.

Oct 6 2020, 6:55 AM · Restricted Project
mivnay requested review of D88897: [AArch64] WIP for review D88742.
Oct 6 2020, 6:39 AM · Restricted Project

Oct 5 2020

mivnay added a comment to D88819: [LV] Support for Remainder loop vectorization.

Did you consider supporting this naturally by just having LV re-visit the newly created remainder loops, i.e. remember the created remainder loops and add them to the top-level worklist https://github.com/llvm/llvm-project/blob/master/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp#L8587 ? We would need to make sure we do not visit them repeatedly, but overall we should be able to achieve the same goal, but without adding extra complexity to the vectorizer.

Oct 5 2020, 8:12 AM · Restricted Project
mivnay added a comment to D88819: [LV] Support for Remainder loop vectorization.

The CFG after the optimization of a typical loop will be as follows:

Oct 5 2020, 4:04 AM · Restricted Project
mivnay requested review of D88819: [LV] Support for Remainder loop vectorization.
Oct 5 2020, 4:00 AM · Restricted Project
mivnay added a reviewer for D88742: [AArch64] Identify SAD pattern: fhahn.
Oct 5 2020, 12:28 AM · Restricted Project

Oct 2 2020

mivnay added reviewers for D88742: [AArch64] Identify SAD pattern: SjoerdMeijer, dmgreen.
Oct 2 2020, 9:26 AM · Restricted Project
mivnay requested review of D88742: [AArch64] Identify SAD pattern.
Oct 2 2020, 9:04 AM · Restricted Project
mivnay added a comment to D88577: [AArch64] Generate dot for v16i8 sum reduction to i32.

I can certainly commit this. I just need an author string to attribute it correctly. Is "Vinay Madhusudan <vinay@compilertree.com>" OK for that?

Oct 2 2020, 8:27 AM · Restricted Project
mivnay updated the diff for D88731: [AArch64] Combine UADDVs to generate vector add.
Oct 2 2020, 8:19 AM · Restricted Project
mivnay added a comment to D88577: [AArch64] Generate dot for v16i8 sum reduction to i32.

The build failure is unrelated to this patch and happening in other patches as well (Example: https://reviews.llvm.org/D88471). What should I do? Also, I do not have commit access. Can you please help in committing this patch?

Oct 2 2020, 5:38 AM · Restricted Project
mivnay requested review of D88731: [AArch64] Combine UADDVs to generate vector add.
Oct 2 2020, 5:31 AM · Restricted Project
mivnay updated the diff for D88577: [AArch64] Generate dot for v16i8 sum reduction to i32.
Oct 2 2020, 3:47 AM · Restricted Project

Oct 1 2020

mivnay updated the diff for D88577: [AArch64] Generate dot for v16i8 sum reduction to i32.

Fixed context and review comments

Oct 1 2020, 11:13 PM · Restricted Project
mivnay retitled D88577: [AArch64] Generate dot for v16i8 sum reduction to i32 from [AArch64] Generate udot for v16i8 sum reduction to i32 to [AArch64] Generate dot for v16i8 sum reduction to i32.
Oct 1 2020, 4:42 AM · Restricted Project
mivnay added inline comments to D88577: [AArch64] Generate dot for v16i8 sum reduction to i32.
Oct 1 2020, 4:41 AM · Restricted Project
mivnay updated the diff for D88577: [AArch64] Generate dot for v16i8 sum reduction to i32.

Added support for SDOT

Oct 1 2020, 4:36 AM · Restricted Project
mivnay added a comment to D88577: [AArch64] Generate dot for v16i8 sum reduction to i32.

Hello

Can you update with full context? -U999999. It makes phabriactor reviews easier to follow.

I had thought about this somewhat in reference to inloop reductions. I had presumed that it would need some form of partial reduction though, as you would want part of the reduction would then happen outside the loop (I think)

Improving codegen on it's own is good, but I'm interested in seeing how this fits with the other patches.

Oct 1 2020, 3:29 AM · Restricted Project
mivnay updated the summary of D88577: [AArch64] Generate dot for v16i8 sum reduction to i32.
Oct 1 2020, 3:18 AM · Restricted Project
mivnay added a comment to D88577: [AArch64] Generate dot for v16i8 sum reduction to i32.

I haven't looked in much detail at this patch, but this looks like some straightforward lowering of llvm.experimental.vector.reduce.add. Absolutely nothing wrong with that, but I am curious who's going to produce this intrinsic? The vectoriser, the matrix pass? In other words, any ideas on the bigger picture?

Oct 1 2020, 2:51 AM · Restricted Project
mivnay added reviewers for D88577: [AArch64] Generate dot for v16i8 sum reduction to i32: SjoerdMeijer, dmgreen, fhahn.
Oct 1 2020, 2:21 AM · Restricted Project

Sep 30 2020

mivnay updated the diff for D88577: [AArch64] Generate dot for v16i8 sum reduction to i32.
Sep 30 2020, 8:42 AM · Restricted Project
mivnay requested review of D88577: [AArch64] Generate dot for v16i8 sum reduction to i32.
Sep 30 2020, 8:24 AM · Restricted Project

Oct 21 2016

mivnay retitled D25868: Loop Utils: Cloning any level loop nest from to Loop Utils: Cloning any level loop nest.
Oct 21 2016, 4:18 AM