User Details
- User Since
- Sep 3 2013, 10:25 PM (526 w, 14 h)
Feb 25 2018
LGTM.
Thanks
Feb 12 2018
Looks good to me.
Thanks
Oct 4 2016
Ping..
Updated diff.
Jun 15 2016
Jun 14 2016
Thanks
Karthik
Jun 13 2016
Jun 12 2016
Abandon old review as im no longer working on the same.
Thanks
Jun 10 2016
Thanks David for the comments. I get your point now. Updated the code to incorporate comment.
Jun 9 2016
Hi David, Matt,
Thanks a lot for the review comments. Please find my comments below-
Jun 8 2016
Update the patch to handle NaN's.
As escha suggested the previous code will fail in case on NaN.
We will need runtime checks to see if the value is NaN and select X or Y accordingly.
Thanks escha for the counter example. I will handle NaN case properly.
Apart from that it should work properly right?
Aug 13 2015
Rebase patch and resolve confilts.
LNT Statistics:
Number of Cross Block Stores eliminated in LLVM-LNT: 849
Number of Benchmarks improved: 2
MultiSource/Benchmarks/Trimaran/enc-pc1/enc-pc1 : 38.86% MultiSource/Benchmarks/mafft/pairlocalalign : 5.36%
No Regressions observed.
Aug 11 2015
Updated patch as per Erik's comments.
Thanks and Regards
Karthik Bhat
Sorry guys for multiple updates on patch..:( forgot to move dyn_cast<StoreInst> above getModRefInfo in one of the place in the previous patch.
Regards
Karthik Bhat
Run clang-format on updated patch and implement Hal's review comments.
I agree with Hal that we may hoist dyn_cast<StoreInst> above ModRef check. In case of store we can rely on Alias Analysis's alias to check if the stores mustalias. For other cases we can use getModRefInfo to make make sure that this instruction which is not a store does not Mod/Ref the pointer. Updated and tested the code after modification.
Please let me know your inputs on the same.
Aug 10 2015
Updated patch as per Erik's comments.
Hi Erik,
Thanks for looking into the patch. Please find my comments inline. Will post the updated patch shortly.
Thanks and Regards
Karthik Bhat
Hi Daniel,
Thanks for the comments.. Filed a bug as
https://llvm.org/bugs/show_bug.cgi?id=24415 .
I too will try to have look at this issue seperatly.. Any other comments on this patch Daniel?
Thanks for your valuable time.
Regards
Karthik
Hi All,
Rebased patch and implemented few review comments.
Please let me know your inputs on the same.
Thanks and Regards
Karthik Bhat
Hi Daniel,Hal,
Thanks for the review. Please find my comments inline. I will try to upload the updated patch shortly.
Thanks and Regards
Karthik
Aug 4 2015
Hi Hal,
Thanks for looking into the patch. Please find my comments inline.
Regards
Karthik Bhat
Aug 3 2015
Jul 28 2015
Hi All,
Update the patch to fix a failure in LLVM-LNT testcase on 32bit ubuntu. Unfortunetly i was not able to catch it on my 64bit machine.
This failure occured as we incorrectly deleted a store that was followed by a store that partially aliased it.
When seraching if a candidate store is safe we conservately mark a store as unsafe if we found a store that may/partially alias the candidate store.
This fixes failure found in LLVM LNT on 32bit ubuntu.
We still observe some 30%+ execution time improvement in MultiSource/Benchmarks/Trimaran/enc-pc1/enc-pc1.
Jul 27 2015
Hi Hal,
Thanks for the review. Please find the updated patch addressing review comments.
Please let me know your opinion on the same.
Thanks and Regards
Karthik Bhat
Jul 21 2015
Jul 16 2015
Hi Daniel,
Please find the updated patch as per your inputs. I have removed the dependency from PDT Frontier and using the DFS in/out on PDT to find the candidate store.
Jul 14 2015
Hi Daniel,
Thanks for the input. Yes this approach is definitely very expensive. I was surprised when i didn't see any compile time regressions on LLVM LNT.
Your approach of using postdom and dfs in/out seems very interesting I will try out the same and update shortly.
Thanks for your valuable inputs. Will get back with the updated patch shortly.
Jul 13 2015
Hi Evgeny,
Thanks for looking into the patch. I think the reason dom frontier is deprecared is that it is costly to calculate the same.. im not sure as well though.
Im currently using postdom frontier to get the basic block boundry till which a store might overwrite a memory location making it dead. Dead store elimation without having postdom seems to be a bit tough especially given the overhead it would require without postdom.
Thanks
Karthik
Attaching LLVM LNT Test results.(15 Samples)
Execution time of 2 benchmarks improved post Cross block DSE patch. Surprisingly i didn't see any compile time regression.
Jul 5 2015
gentle ping...
Jun 30 2015
Jun 26 2015
Thanks Anna. (Has been committed as r240800)
Hi Anna,
Thanks for the input. Your code seems much more structured. Updated the code and tests as per your comments.
I had added-
if (SM.isInMainFile(SL)) return Mode;
because my understanding was we do not analyze header files till analyze-all option is specified but it seems i was wrong we do analyze function definition inside header files if called from the main file(checked in llvm 3.5). Updated the code to remove the check.
Jun 23 2015
Hi Anna,
Thanks for the review. Please find the updated code as per review comments.
Please let me know if this looks good to you.
Thanks and Regards
Karthik Bhat
Jun 22 2015
gentle ping..
Jun 1 2015
Apr 22 2015
Hi Renato,
Updated test cases as per comments. Please let me know if this looks good to you.
Verified with Debug+Assert build and make check-all on X86_64.
Apr 21 2015
This has been submitted as r235284.
Not getting an option to close revision hence abondoning it..:(
Hi James,Renato,
Updated LoopInterchange.cpp as per review comments. Please have a look when you find time.
Thanks a lot for your time and guidance.
Thanks and Regards
Karthik Bhat
Hi James, Renato,
Thanks for the comments. Updated the code to address review comments in LoopInterchange. Please find my comments inline.
Thanks and Regards
Karthik Bhat
Apr 20 2015
Hi Renato,
Thanks for the review and time. Updating code to reflect changes done in r235284.
Apr 19 2015
Thanks Renato. Comitted as r235284.
Apr 17 2015
Hi Adam,
Thanks for the link. It was quite useful. Updated the code to remove redundant #includes.
Please let me know if you have any other comments.
Thanks and Regards
Karthik Bhat
Hi Renato,
Thanks for the review. Addressing review comments.
I have made ReductionInstDesc and ReductionDescriptor as class as per your suggestion. Made member variables as private and exposed them through get functions so that they can be used in LoopVectorizer. I hope you are ok with this change?
Hi All,
Updated the code to address Adam,Renato's comments.
Following are the changes-
- Moved Enums inside structs.
- Marked local functions in struct as static.
- Moved functions inside of ReductionDescriptor.
Apr 16 2015
Hi Renato,
Updated code as per reveiw comments. Refactored reduction identification code out of loop vectorizer and reusing the same in this pass. This code assumes D9046 has been applied.
Hi Renato,
Thanks for your valuable inputs. I have addressed one of your comments and modified the comments as per your suggestion.
Will wait for your/James input on if to move the enums into namespace/structure.
Hi James,
Thanks for the review. Yes it makes sense to move it out of LoopVectorizer.cpp. Updated the code as per reveiw comments.
Added a file LoopUtils.cpp in lib/Transforms/Utils with refactored code. We can add any additional common loop functionalities in this util file.
Apr 15 2015
Apr 14 2015
Hi Hal,Renato,
A gentle ping for review.
Apr 5 2015
A gentle ping..
Mar 27 2015
Add full context diff.
Hi Renato,
Sorry for the delay in followup on this patch was stuck in some other work. I was able to refactor isInductionVar out of LoopVectorizer and we are using the re-factored function in this patch.
Mar 26 2015
Thanks Arnold. Committed as r233352.
Mar 25 2015
Mar 23 2015
Thanks Renato, Tobias for your inputs. Please find my comments inline-
Mar 17 2015
Hi Hal,Renato,
Refactor some common code into functions. I have currently borrowed and modified some functions from loop vectorizer. Do i need to refactor them into a common utility as well? These functions such as AddReductionVar seems to be a bit tightly bound with loop vectorizer code.
Mar 13 2015
Hi Renato,
Thanks for looking into the patch. Please find my comments inline.
Wouldn't this be better to be integrated into the vectorizer? Or at least, not duplicated code from the vectorizer?
Since this is a complete pass integrating this will loop vectorizer would be a bit difficult. I can try to move common functions into a utility file if required.
Mar 10 2015
Thanks comitted as r231788.
Thanks Hal. Comitted as r231784.
Will try to understand delinearization analysis and get back shortly.
Can you also have a look at D8059 when you find time.
Thanks and Regards
Karthik Bhat
Hi Hal,
The logic added currently prevents marking the GEP as UsefulGEP and hence different GEP index are not compared independently.
Dependency analysis currently reports an anti dependency between the 2 memory access as -
da analyze - consistent input [S S]! da analyze - anti [S S|<]! da analyze - consistent output [S S]!
as it is not considered as UsefulGEP as it doesn't know if k can overflow or not.
Mar 9 2015
Mar 8 2015
Patch was moved to D7499 and committed as r231458.
Mar 6 2015
Thanks Hal. Committed as r231458 after implementing review comments. Will raise a review for DependencyAnalysis fix shortly.
Thanks and Regards
Karthik Bhat
Mar 5 2015
Hi All,
Rebase to trunc and update the test cases to reflect recent changes in IR format.
Mar 4 2015
Feb 26 2015
Hi Hal,
Please find the updated patch attached. Addressed review comments and fixed few issues in Loop Interchange found during llvm lnt regression. After this change we find some improvement in execution time of 2 benchamrk test cases as shown in the previous post. There are few issues in Dependency analysis as you mentioned I'm planning to address it seperatly after looking into the module in more detail. Hope that should be fine?
Please let me know your inputs on the patch.
Thanks for your time and help.
Regards
Karthik Bhat
Hi Hal,
Please find my comments inline. Updated the patch as per review comments and fixed few issues found during llvm lnt regression.
The current version of loop interchange gives some 30% improvement in execution time in 2 benchmarks. This is because it contains code fragments like -
for (i = 0; i < _PB_N; i++) for (j = 0; j < _PB_N; j++) x[i] = x[i] + beta * A[j][i] * y[j];
which gets benefited after interchange.
There are few compile time regression which can be because of the heavy legality checks in loop interchange. I will try to fix this in next iteration.
Apart from this we found a crash in Dependency Analysis module which I'm planning to fix seperatly as i need to understand it in more detail. Will raise a bug on the same.
Feb 22 2015
Hi Hal,
I had some time to work on generic version of loop interchange to support any depth. This updated patch supports loops of any depth.
The loop selection algorithm currently selects the innermost loop for interchange. Going forward we can improve this heuristic to select the most profitable loop based on Dependency matrix.
To keep it simple in the first version loops with LCSSA phi are currently not handled. I will work on handling them in later iterations.
The legality and profitability logic is pretty much the same. We use dependency matrix to conclude legality of interchange of 2 loops.
Feb 19 2015
Hi Hal,
Thanks for the review.
Please find the updated patch. Moved tests from vectorize.ll to profitability.ll and interchange.ll. Checking for loop interchange as per comments. Also added a negative test case to check profitabilitymodel.(i.e. were it is legal but not profitable to interchange).
Feb 18 2015
Hi Hal,
Sorry for the delay in followup. I was on a vaction.
Please find the updated and rebased patch. Added test cases as per your suggestion. I also verified the o/p of the programs on randomly generated array and o/p's are same before and after interchange in cases were loops are interchanged.
Feb 13 2015
Hi Hal,
Updated the test case to add a test case to cover case were we have a usage of PHI outside the loop. I have added gi,gj as global vaiable and used them as induction variables in the loop to simulate this case.
Feb 12 2015
Hi Hal,
Thanks a lot for the review. Updated the patch to address review comments. Also fixed a few issues which I found during testing.
Major changes include-
- Logic to calculate profitibility has been made more acurate.
- Logic to detect were to split the inner loop is changed to be more acurate.
- Added test case to check the updated profitibility model.
Hi Hal,
Thanks for the review. Please find my comments updated below. Will upload modified patch shortly.
Feb 9 2015
Hi James,Hal,
Thanks for your inputs. Hal I will move this to llvm commits shortly was working on few issues which i found during testing.
James i tried the example-
int a = 2; for (i=0;i<N;++i) { for (j=0;j<M;++j) { A[j][i+a] = A[j][i]+1; } }
The dependency analysis returns an anti dependecy of [-2 0] between the load and store and we interchange successfully.
But if add is unknown then i assume it is not safe to interchange as we do not know for sure what dependecy exits.
Feb 6 2015
Hi James,Pekka,
Thanks for the comments and spending your valuable time on this. Please find my comments.
Feb 5 2015
Jan 23 2015
Thanks Anna i think it makes sense.
Yes David i noticed it just now after you pointed out i had incorrectly added llvm-commit insted of cfe.
Jan 22 2015
ping.
Jan 20 2015
Jan 19 2015
Thanks Michael. Commit as r226547
Hi Michael,
I agree with you on the above example. Added a FIXME comment as per your suggession.
Updated variable names as per review comments.
Thanks and Regards
Karthik Bhat
Hi Michael,
Thanks for the input. I really appreciate your help in the review process. I will rename the variables and update the patch as per review comments shortly.
Just one clarification here.
In your example -
b[0] b[0] b[1] a[0]+c[0] b[2] a[1]*c[1] b[3] d[5]/e[6]
Post reordering i think we will still have good load order but in the right lane instead of left lane. If I'm not wrong we would get something like-
b[0] b[0] a[0]+c[0] b[1] a[1]*c[1] b[2] d[5]/e[6] b[3]
Jan 18 2015
Hi Micahel, sorry i think the comments above are not readable.Some problem with phabricator formatting.
Updated comments to make if more readable.
Hi Michael,
Thanks for the review. Addressed review comments after slight modification in code flow. This version does minimal changes in reorderInputsAccordingToOpcode and to make logic clear i have added appropriate comments in the code.
The only change is the check we have added at the end of reorderInputsAccordingToOpcode to reorder operands to create a longer vectorizable chain without effecting AllSameOpcode property.
Jan 14 2015
Hi Michael,
Thanks for the reveiw. Addressed all review comments.
I'm ok with the current way we are handling reorderInputsAccordingToOpcode. If we were to move the load matching before checking AllSameOpcodeRight we would need to maintain a flag so that we do not reorder operands that were swapped during consecutive load matching.(This would end up similar to the initial version of the patch that was uploaded with needsReordering flag). I do not see much difference in the 2 approaches. I hope we can keep this as it is?
Jan 13 2015
Please find my comments inline. Thanks.
Hi Michael,
Thanks for the review. Please find the updated patch and my comments inline.
Please let me know if this is good to commit.
Thanks and Regards
Karthik Bhat
Jan 12 2015
Hi Michael,
While still at this patch fix one more issue(small change) originally present in reorderInputsAccordingToOpcode.
Hi Michael,
Thanks for your time and review comments.
Updated the patch addressing the review comments.
Jan 11 2015
Hi Michael,
One small update in the patch. I was not handling case were we have right operand as load and left as some other binary operation in case of reorderAltShuffleOperands as a result we were not vectorizing code such as -
void foo(double* c,double* restrict a,double* restrict b,double* restrict d) { c[0] = (a[0] + b[0])-d[0]; c[1] = d[1]+(a[1]+b[1]); }
This is fixed with this updated patch. This is not required in reorderInputsAccordingToOpcode as it is already handled while we sort as per the opcode.
Please let me know your inputs on the same.