- User Since
- Sep 17 2015, 10:06 AM (196 w, 6 d)
Thu, Jun 13
Wed, Jun 12
hmm, I have another failure with this change on my setup, now it is PR39774.ll. Probably, it might be sort algorithm differents of similar, since it just swap of two inserts. I don't have this failure if PR39774.ll is intact.
Tue, Jun 11
fixed a typo.
Rebased, fixed all remarks.
Thu, Jun 6
Wed, Jun 5
Tue, Jun 4
looks good to me.
Mon, Jun 3
Sat, Jun 1
Addressing last remarks, rebased.
Tue, May 28
May 23 2019
oh, I notice some regression with the change for AArch64/matmul.ll and AArch64/transpose.ll. Maybe there is a way to isolate it with heuristics?
May 22 2019
May 21 2019
yes, Looks good.
May 20 2019
May 19 2019
May 18 2019
Looks good to me either.
May 16 2019
Added comments to cutBranch() and extendSubTree() functions. Removed dupicate variable "I" in extendSubTree() function.
Again, the slp-throttling is on to just show the impact on SLP tests.
Looks like fixed all remark, rebased.
May 10 2019
May 9 2019
looks like "class nodes_iterator" definition at 1856 is not formatted properly.
May 6 2019
Found typo in the previous upload.
Rebased. Simplified solution by removing leaves structure and any notion of branches. Left "slp-throttling" flag true by default just to show the impact on SLP tests.
Apr 30 2019
Implemented tree/graph throttling solution.
Mar 29 2019
Mar 25 2019
Looks good to me. Abataev, any objections?
Mar 22 2019
Changed getTreeCost loop to condition "--I;".
Fixed remarks, removed Cost state from TreeEntry, turned cl::opt throttling flag false by default.
Mar 21 2019
Fixed cost estimation for canceled elements to be inserted to interact with vectorized.
Replaced main throttling loop in reverse-post-order traversal as it was suggested in the previous remarks.
Making final decision on throttle inside getTreeCost() we don't need to hold cost estimations in TreeEntry.
Avoid vectorizing any flow instructions during partial vectorization.
Added cl::opt flag for Throtelling, it is now on by default to show the impact on tests, but during commit, I could make this option off by default to estimate the whole impact before enabling everywhere.
Here are the spec cup 2006 numbers before and after:
Mar 19 2019
Mar 14 2019
Delete unnecessary variables from getTreeCost().
Mar 13 2019
Rebase the change, Fix Vasileios remarks.
Mar 8 2019
Add cost calculation for in vectorizable tree operation to be extracted for the canceled tree elements use. Fix performance regression on Phoronix's c-ray-1.2.0 test-suite by avoiding to vectorize just one seed instruction. Spec 2006 numbers are unchanged with several instances of new vectorizations but not hot enough to change the numbers. Also, I noticed a good vectorizations example with Phoronix's encode-mp3-1.7.3 of two FFT kernels with ~8% faster for fft_short kernel and ~3% faster for fft_long on Intel(R) Core(TM) i7-6700HQ.
Mar 1 2019
Change the decision of reducing tree if only the cost of vectorizing is too high and reduce the tree to the highest element of the tree that is minimally profitable.
Feb 26 2019
sorry, I found my previous result not quite reliable, I have changed my methodology, I will rerun spec benchmarks.
Fix cost calculation, fix the run-time error by adding external users for canceled elements in reduceTree(), add testcase.
Feb 22 2019
looks like, I addressed the issue of leaf gather nodes.
Feb 19 2019
oh, Rechecked the algorithm at 2660 at it looks correct. sorry.
I found an incorrect way of summing costs at line 2660, it should be backward loop here.
Feb 18 2019
Fix several issues with cost estimation, add support for tree hight reduction even for profitable trees.
Feb 10 2019
Clean the current SLP throttle implementation, Applied some Vasileios remarks. Here is spec 2k6 results before and after the change on i7-6700HQ.
Feb 5 2019
Feb 4 2019
Feb 3 2019
Feb 1 2019
This is quite rare cases when it decide to duplicate after limiting to a maximum number of elements of less than half of vector size and calculating cost of this transformation. Abandoning.
Jan 29 2019
update duplicate.ll testcase.
Jan 18 2019
removed bundle reordering by replacing pseudo instructions with real ones.
Dec 28 2018
Dec 7 2018
Introduced InstructionOrPseudo structure and removed "Instruction::Invoke" out of tryToRepresentAsInstArg() function with error example in invoke.ll testcase.
Dec 4 2018
Nov 27 2018
LGTM, with @spatel remark about the comment.
Nov 14 2018
Ops, I found a few typos, Formated tryToRepresentAsInstArg() and removed the second SRem from isRemainder().
Looks like I fixed all previous remarks also during testing I found two more issues with the change and I fixed both.
Nov 7 2018
Oct 31 2018
Implemented Map<Instruction*, std::pair<Value *Parent, unsigned Opcode> indexing for ScalarToTreeEntry, PseudoInstScheduleDataMap.
Added reorderBundles() function to reorder bundles that have common instructions according to instructions layout after SLP scheduling.
Oct 22 2018
Oct 13 2018
Update after I found another couple of errors after additional testing the change. Here are changes:
Removed OpValue field out of PseudoScheduleData.
Forbid any bundles with non-alternative operations and remainder operation, see rem-bundle.ll.
Fixed error in setInsertPointAfterBundle() function by using getScheduleData() instead of getInstrScheduleData and if a bundle member is present multiple bundles at the same time then walk through the bundle to find the last scheduled member of the bundle. see insert-after-multiple-bundle.ll
Restore MemoryDependencies to SmallVector, we don't have to count a member presents in calculateDependencies().