Page MenuHomePhabricator

AaronLiu (Aaron H Liu)
User

Projects

User does not belong to any projects.

User Details

User Since
Jan 29 2020, 2:38 PM (27 w, 3 d)

Recent Activity

Jul 8 2020

AaronLiu removed a reviewer for D81416: [LV][SLP] Interleave to expose ILP for small loops with scalar reductions.: nikic.
Jul 8 2020, 9:10 AM · Unknown Object (Project), Restricted Project
AaronLiu added a comment to D81416: [LV][SLP] Interleave to expose ILP for small loops with scalar reductions..

Thanks! Will keep eyes on it in the future.

Jul 8 2020, 5:41 AM · Unknown Object (Project), Restricted Project

Jul 7 2020

AaronLiu updated the diff for D81416: [LV][SLP] Interleave to expose ILP for small loops with scalar reductions..

deleted: llvm/test/Transforms/PhaseOrdering/interleave_LV_SLP.ll
deleted: llvm/test/Transforms/PhaseOrdering/interleave_LV_SLP_false.ll
deleted: llvm/test/Transforms/SLPVectorizer/PowerPC/interleave_SLP.ll

Jul 7 2020, 2:37 PM · Unknown Object (Project), Restricted Project
AaronLiu added a comment to D81416: [LV][SLP] Interleave to expose ILP for small loops with scalar reductions..

In that case, best understand why LV's cost model claims vectorizing the loop is not profitable, which you and SLP know it is; and ideally fix LV's cost model.
A crash due to forced vectorization sounds like a bug, which best be reported and/or fixed.
If cases with concrete "obstacles" are identified preventing LV from vectorizing a loop but allowing SLP to vectorize (part of) it, after LV interleaves the loop, such obstacles could potentially be used to (further) drive LV to interleave the loop.

Agree, ideally LV's cost model and its vectorization functionality should be improved in the future to be able to vectorize a lot more instructions.
We see some applications keep being crashed, due to some changes in LV and probably being fixed later on, or because of its own weakness in some aspects.
But all the above are beyond of this patch.
Currently, LV and SLP complement each other, and there are cases that LV fails to vectorize (functionally not being able to do it) but SLP succeed.

Hence the term "small loop" should be more specific; as in "vectorizer-min-trip-count" / "TinyTripCountVectorThreshold".

The "small or tiny" values are relative, and will keep on changing. In the situations we see, it is even more dynamic, the exact trip count is not known, but we know that it is relatively small.

Jul 7 2020, 2:15 PM · Unknown Object (Project), Restricted Project
AaronLiu added a comment to D81416: [LV][SLP] Interleave to expose ILP for small loops with scalar reductions..

May I suggest we only test what this patch actually changes? This patch adds an option which when enabled allows interleaving of loops with small trip counts and scalar reductions, so it suffices to test exactly that. That should be covered by llvm/test/Transforms/LoopVectorize/PowerPC/interleave_IC.ll. I think the other test cases can be removed. IMHO adding more tests to make sure SLP vectorization happens (and the like) are redundant, add unnecessary maintenance in the future and are beyond the scope of what this patch is trying to do.

Jul 7 2020, 9:15 AM · Unknown Object (Project), Restricted Project
AaronLiu added a comment to D81416: [LV][SLP] Interleave to expose ILP for small loops with scalar reductions..

In the application we try, LV refuse to vectorize due to not profitable, but if we force LV to vectorize and it will crash. Apparently there are some obstacles. There are cases that even if LV fails, SLP could succeed.
Yes, the term small loop is a little bit of confusing. For example a loop which has a small number of instructions but has a huge loop trip count, is the loop small or big? In our example, the loop trip count is small, and also the instruction number is small.

Jul 7 2020, 9:13 AM · Unknown Object (Project), Restricted Project

Jun 30 2020

AaronLiu updated the diff for D81416: [LV][SLP] Interleave to expose ILP for small loops with scalar reductions..

Update testcases.

Jun 30 2020, 12:29 PM · Unknown Object (Project), Restricted Project
AaronLiu added a comment to D81416: [LV][SLP] Interleave to expose ILP for small loops with scalar reductions..

In this patch we "Interleave to expose ILP". The whole purpose of this patch is to "expose ILP", and the approach is to "Interleave".

  • I worry about that if we remove those testcases, no vectorization(parallelism) results due to this patch can be seen, and people will have no idea where do we "expose ILP"?
  • We have ever discussed that what @spatel suggested adding two testcases under PhaseOrdering to show the baseline results with the option in this patch on and off "do make sense".
  • The purpose of adding lit tests is to catch future regressions, i.e., if someone make the vectorization not working, we will catch it with the testcases we added.
Jun 30 2020, 9:12 AM · Unknown Object (Project), Restricted Project

Jun 26 2020

AaronLiu added a comment to D81416: [LV][SLP] Interleave to expose ILP for small loops with scalar reductions..

Currently all four testcases serve different purposes, and we can clearly see their differences:

  • PhaseOrdering/interleave_LV_SLP_false.ll gives the baseline result, which shows that with the option in this patch off, instructions are not being vectorized.
  • PhaseOrdering/interleave_LV_SLP.ll also gives the baseline result, which shows that with the option in this patch on, instructions are being vectorized. But which vectorizer make it work? We cannot tell.
  • Vectorize/LoopVectorize.cpp tells us that LV cannot vectorize the code, but interleave the instructions to expose ILP.
  • SLPVectorizer/PowerPC/interleave_SLP.ll demonstrates that after interleaving by LV, then SLP captures the opportunities and vectorize the instructions.
Jun 26 2020, 9:17 AM · Unknown Object (Project), Restricted Project
AaronLiu updated the diff for D81416: [LV][SLP] Interleave to expose ILP for small loops with scalar reductions..

Update three testcases, and add one more testcase to this patch.

Jun 26 2020, 9:17 AM · Unknown Object (Project), Restricted Project

Jun 24 2020

AaronLiu added a comment to D81416: [LV][SLP] Interleave to expose ILP for small loops with scalar reductions..

I think all the four testcases are related to this patch, and I prefer to keep the testcase in this patch. The four testcases all serve different purposes:

  • PhaseOrdering/interleave_LV_SLP.ll shows that with the option in this patch on, instructions were vectorized. But which vectorizer make it work? We cannot tell.
  • Vectorize/LoopVectorize.cpp tells us that LV cannot vectorize the code, but interleave the instructions to expose ILP.
  • SLPVectorizer/PowerPC/interleave_SLP.ll demonstrates that after interleaving by LV, then SLP captures the opportunities and vectorize the instructions.
  • The one you added in PhaseOrdering/interleave-vectorization.ll show that without this patch(equivalently with the option in this patch off), the same testcase both LV and SLP cannot vectorize the instructions.

This still isn't quite what I was hoping for. Can you just add "-interleave-small-loop-scalar-reduction=true" to the RUN lines of the file I added and update the CHECK lines using the script? I want this patch to show *the diff* in IR for the proposed code change. I can't easily tell what is changing by comparing 2 different files.

Jun 24 2020, 2:41 PM · Unknown Object (Project), Restricted Project

Jun 23 2020

AaronLiu added a comment to D81416: [LV][SLP] Interleave to expose ILP for small loops with scalar reductions..

I add a test: llvm/test/Transforms/PhaseOrdering/interleave_LV_SLP.ll
Please let me know whether that is what you wanted.

Thanks - that was the start of what I requested, but not complete. I've added the test here using auto-generated CHECK lines that show baseline (without this patch) results:
rGdf79443

Please rebase and update that file using the script at llvm/utils/update_test_checks.py.
I don't think we need to duplicate the test in the SLP folder now that we have coverage for this example in PhaseOrdering, but if you think that is still useful, that can be added independently of this patch.

Jun 23 2020, 2:32 PM · Unknown Object (Project), Restricted Project
AaronLiu updated the diff for D81416: [LV][SLP] Interleave to expose ILP for small loops with scalar reductions..
Jun 23 2020, 2:32 PM · Unknown Object (Project), Restricted Project

Jun 22 2020

AaronLiu added a comment to D81416: [LV][SLP] Interleave to expose ILP for small loops with scalar reductions..

Ping...

Jun 22 2020, 9:09 AM · Unknown Object (Project), Restricted Project

Jun 17 2020

AaronLiu added a comment to D81416: [LV][SLP] Interleave to expose ILP for small loops with scalar reductions..

IIUC, we should add a test under test/Transforms/PhaseOrdering with -O2 to show the cooperative effect of the 2 vectorizers rather than a stand-alone SLP test.
If you can push that test with full baseline CHECK lines and then apply this patch and show test diffs, that would make it much easier to tell what is intended with this patch.

Jun 17 2020, 5:17 PM · Unknown Object (Project), Restricted Project
AaronLiu updated the diff for D81416: [LV][SLP] Interleave to expose ILP for small loops with scalar reductions..

Add llvm/test/Transforms/PhaseOrdering/interleave_LV_SLP.ll

Jun 17 2020, 5:17 PM · Unknown Object (Project), Restricted Project

Jun 16 2020

AaronLiu added a comment to D81416: [LV][SLP] Interleave to expose ILP for small loops with scalar reductions..

We use PhaseOrdering tests to ensure that the end result of >1 IR pass (usually the entire pipeline of -O* settings) produces the expected result. That may be stretching the meaning of PhaseOrdering, but that would be less fragile than the stand-alone SLP test. This patch isn't changing anything in SLP, so the test you are adding to SLP is independent of this patch, right?

Jun 16 2020, 11:00 AM · Unknown Object (Project), Restricted Project
AaronLiu added a comment to D81416: [LV][SLP] Interleave to expose ILP for small loops with scalar reductions..

IIUC, we should add a test under test/Transforms/PhaseOrdering with -O2 to show the cooperative effect of the 2 vectorizers rather than a stand-alone SLP test.
If you can push that test with full baseline CHECK lines and then apply this patch and show test diffs, that would make it much easier to tell what is intended with this patch.

Jun 16 2020, 9:54 AM · Unknown Object (Project), Restricted Project
AaronLiu added inline comments to D81416: [LV][SLP] Interleave to expose ILP for small loops with scalar reductions..
Jun 16 2020, 9:21 AM · Unknown Object (Project), Restricted Project
AaronLiu updated the diff for D81416: [LV][SLP] Interleave to expose ILP for small loops with scalar reductions..

Update comments to address code review.

Jun 16 2020, 8:16 AM · Unknown Object (Project), Restricted Project
AaronLiu updated the diff for D81416: [LV][SLP] Interleave to expose ILP for small loops with scalar reductions..

Add comments to address code review.

Jun 16 2020, 7:42 AM · Unknown Object (Project), Restricted Project

Jun 15 2020

AaronLiu added inline comments to D81416: [LV][SLP] Interleave to expose ILP for small loops with scalar reductions..
Jun 15 2020, 1:14 PM · Unknown Object (Project), Restricted Project
AaronLiu added a comment to D81416: [LV][SLP] Interleave to expose ILP for small loops with scalar reductions..

Ping...

Jun 15 2020, 8:40 AM · Unknown Object (Project), Restricted Project

Jun 10 2020

AaronLiu added a comment to D80175: [PowerPC][MachineCombiner] reassociate fma to expose more ILP.

Can you add a hidden option with init false? You can turn it true later on.
So that people can try with your option off and on? Thanks!

This is not adding a new optimization but is increasing the scope of an existing optimization. So I don't think it is appropriate to add an option to control only this aspect of this. If there is a use case for allowing the user to have fine grained control over which combiner patterns to use, then we can add an enum option. But we need to have justification for why that is needed.

OTOH, I am not opposed to adding an option to turn off the machine combiner (i.e. change the value returned by PPCInstrInfo::useMachineCombiner()) - but that is orthogonal to this patch and can be done separately.

Jun 10 2020, 9:56 AM · Restricted Project
AaronLiu added a comment to D80175: [PowerPC][MachineCombiner] reassociate fma to expose more ILP.

Can you add a hidden option with init false? You can turn it true later on.
So that people can try with your option off and on? Thanks!

Have you met some issues with this being turned on? This is specific for PowerPC, I have verified that on PowerPC there is no deg for the patch on the benchmarks I run. If you just want to see the impact of this patch, I would suggest you comment out the two newly added lines in function PPCInstrInfo::getMachineCombinerPatterns, it will turn off this opt.

Jun 10 2020, 7:04 AM · Restricted Project

Jun 9 2020

AaronLiu added a comment to D67948: [LV] Interleaving should not exceed estimated loop trip count..

I test it on PowerPC, I request it to be tested by the community on different platform. Currently it is disabled by default, and will enable it in the next step.

Jun 9 2020, 3:31 PM · Restricted Project
AaronLiu added a comment to D67948: [LV] Interleaving should not exceed estimated loop trip count..

We have observed some performance regressions, presumably because the vectorized code started to kick-in on short estimated trip count loops (as opposed to skipping vector code and execute scalar code). We'll try following up with cost model tuning. I'm not too surprised if others also hit a similar issue. Overall, though, that's the right direction to head to.

This was fixed?

Jun 9 2020, 2:21 PM · Restricted Project
AaronLiu added a comment to D67948: [LV] Interleaving should not exceed estimated loop trip count..

Accidentally removed the message that I posted above. Re-post here: basically what I want say is to request reviewers for this patch to review another patch D81416 that touch the same file. Thanks!

Jun 9 2020, 12:07 PM · Restricted Project

Jun 8 2020

AaronLiu added a comment to D67948: [LV] Interleaving should not exceed estimated loop trip count..
Jun 8 2020, 11:25 PM · Restricted Project
AaronLiu added a comment to D80175: [PowerPC][MachineCombiner] reassociate fma to expose more ILP.

Can you add a hidden option with init false? You can turn it true later on.
So that people can try with your option off and on? Thanks!

Jun 8 2020, 11:09 PM · Restricted Project
AaronLiu updated the diff for D81416: [LV][SLP] Interleave to expose ILP for small loops with scalar reductions..

A little bit of format change.

Jun 8 2020, 2:25 PM · Unknown Object (Project), Restricted Project
AaronLiu created D81416: [LV][SLP] Interleave to expose ILP for small loops with scalar reductions..
Jun 8 2020, 12:07 PM · Unknown Object (Project), Restricted Project