- User Since
- Sep 17 2015, 10:06 AM (205 w, 1 d)
Some of the results are worse than without the patch. Can we investigate why?
yes, I will focus on 401.bzip2, 429.mcf. but this SLP throttling supposes be off by default and it on here just to show the impact on tests.
Here is SPEC CPU 2006 INT results:
Mon, Aug 19
I wanted to see if this patch would help with a problem that I'm looking at, but I get this error when trying to build:
Yes, it should be "llvm::" , I will rebase the patch and I am thinking to suggest to you this solution in D64142.
Fri, Aug 16
Thu, Aug 15
looks good to me.
Wed, Aug 14
fixed formatting issue at line 1308.
Mon, Aug 12
Mon, Aug 5
Fixed all previous remarks, rebased.
Fri, Aug 2
Tue, Jul 30
Fixed typo at line 123.
Sun, Jul 28
Fri, Jul 26
Rebased, fixed previous remarks. Here we still have InstructionsState and I have the full change with the state in TreeEntry but I prefer to break it in several reviews.
Jul 23 2019
Jul 22 2019
Ping, if approved I am going to commit with "slp-throttling" disabled by default, it is enabled just to show the impact on the test.
Jul 15 2019
Jul 13 2019
Jul 12 2019
Jul 6 2019
Jul 3 2019
Overall, I don't see any issues with linking TreeEntry and SchedularData with lane information.
Jul 2 2019
Rebased, updated incorrect comment in treeTraversal() function.
Jun 29 2019
Fixed issue with visiting the same node twice also added the assertation that we are not visiting any node twice and the assertion that all proposed nodes were visited.
Fixed incorrect algorithm implementation in findSubTree() fundction.
Jun 26 2019
Rebased and addressed previous remarks.
Jun 13 2019
Jun 12 2019
hmm, I have another failure with this change on my setup, now it is PR39774.ll. Probably, it might be sort algorithm differents of similar, since it just swap of two inserts. I don't have this failure if PR39774.ll is intact.
Jun 11 2019
fixed a typo.
Rebased, fixed all remarks.
Jun 6 2019
Jun 5 2019
Jun 4 2019
looks good to me.
Jun 3 2019
Jun 1 2019
Addressing last remarks, rebased.
May 28 2019
May 23 2019
oh, I notice some regression with the change for AArch64/matmul.ll and AArch64/transpose.ll. Maybe there is a way to isolate it with heuristics?
May 22 2019
May 21 2019
yes, Looks good.
May 20 2019
May 19 2019
May 18 2019
Looks good to me either.
May 16 2019
Added comments to cutBranch() and extendSubTree() functions. Removed dupicate variable "I" in extendSubTree() function.
Again, the slp-throttling is on to just show the impact on SLP tests.
Looks like fixed all remark, rebased.
May 10 2019
May 9 2019
looks like "class nodes_iterator" definition at 1856 is not formatted properly.
May 6 2019
Found typo in the previous upload.
Rebased. Simplified solution by removing leaves structure and any notion of branches. Left "slp-throttling" flag true by default just to show the impact on SLP tests.
Apr 30 2019
Implemented tree/graph throttling solution.
Mar 29 2019
Mar 25 2019
Looks good to me. Abataev, any objections?
Mar 22 2019
Changed getTreeCost loop to condition "--I;".
Fixed remarks, removed Cost state from TreeEntry, turned cl::opt throttling flag false by default.
Mar 21 2019
Fixed cost estimation for canceled elements to be inserted to interact with vectorized.
Replaced main throttling loop in reverse-post-order traversal as it was suggested in the previous remarks.
Making final decision on throttle inside getTreeCost() we don't need to hold cost estimations in TreeEntry.
Avoid vectorizing any flow instructions during partial vectorization.
Added cl::opt flag for Throtelling, it is now on by default to show the impact on tests, but during commit, I could make this option off by default to estimate the whole impact before enabling everywhere.
Here are the spec cup 2006 numbers before and after:
Mar 19 2019
Mar 14 2019
Delete unnecessary variables from getTreeCost().
Mar 13 2019
Rebase the change, Fix Vasileios remarks.
Mar 8 2019
Add cost calculation for in vectorizable tree operation to be extracted for the canceled tree elements use. Fix performance regression on Phoronix's c-ray-1.2.0 test-suite by avoiding to vectorize just one seed instruction. Spec 2006 numbers are unchanged with several instances of new vectorizations but not hot enough to change the numbers. Also, I noticed a good vectorizations example with Phoronix's encode-mp3-1.7.3 of two FFT kernels with ~8% faster for fft_short kernel and ~3% faster for fft_long on Intel(R) Core(TM) i7-6700HQ.
Mar 1 2019
Change the decision of reducing tree if only the cost of vectorizing is too high and reduce the tree to the highest element of the tree that is minimally profitable.
Feb 26 2019
sorry, I found my previous result not quite reliable, I have changed my methodology, I will rerun spec benchmarks.
Fix cost calculation, fix the run-time error by adding external users for canceled elements in reduceTree(), add testcase.