- User Since
- Aug 18 2016, 4:39 AM (79 w, 1 d)
Thanks Evandro! LGTM.
Mon, Feb 19
I think this change should be fine for modern AArch64 chips. For example, Cortex-A72 has 2 integer and 2 fp units. Other modern cores have 3 integer units and 2 fp units. For both, those settings should be more realistic. I can't run any benchmarks at the moment though, as I am travelling.
Sat, Feb 17
Please update the commit message with the updates from your comment :)
Thanks Diego, LGTM. Sorry for the long delay! Please wait with committing a few days, to give other people a little bit more time to voice additional concerns.
Thu, Feb 15
Refactor tryCandidate() into digestible parts so that a target that wants to override this method with minor modifications can do so easily. I tried to do a separation into the logical parts, but I am of course open to alternatives, including name changing of the new methods.
Wed, Feb 14
Thanks for the pointers. Do you think it would make sense to add something generic which takes various analysis managers, etc, and extracts the analysis on demand, for other passes to use too?
Thanks! I will update this patch when I am back, at the end of next week.
About pattern ID in output: in theory there could be several patterns for one Root and every pattern could have several alternative sequences, right? I think in this case the pattern ID will help to distinguish all these removed/substituted sequences from each other. Am I right or it's better to remove the IDs?
I could not find a case where not considering reachability would prevent us from partial inlining with this patch.
Tue, Feb 13
One last question: is DomTreeWrapperPass always available when LoopInterchange is run? I'm thinking about requireAnalysis vs. getAnalysisIfAvailable.
Thanks! I've moved collecting and applying updates to adjustLoopLinks and added an assertion to make sure we do not have BBs with redundant successors.
Reverted again in rL325009, as memsan was not happy with the array of the ValueToValueMaps. I'll look into how to resolve this.
Thanks, and sorry for taking so long to come back to this. I think it looks good, there is just the missing newline issue. After that's resolved I would be happy to sign off on the machine combiner changes, but I would prefer if we move the lib/CodeGen/MachineInstr.cpp to a separate patch.
Mon, Feb 12
Thanks again Eli! I've updated the patch to always preserve the DT and also added -verify-dom-info to the tests that interchange loops.
Only update DT if DT != nullptr, add -verify-dom-info to a test.
Mark DominatorTreeWrapperPass as preserved with old pass manager.
updated to use use getAnalysisIfAvailable for the old pass manager and getCachedResult for the new pass manager.
Add DominatorTreeWrapperPass to preserved analysis.
Fri, Feb 9
Thanks @junbuml ! I'll wait with committing until the middle of next week or so
Thu, Feb 8
LGTM thanks. Please wait a day or so with committing so others can raise additional concerns.
Wed, Feb 7
If getUserCost is supposed to represent only the code size (I believe so), then we may need to introduce a new target hook to handle sdiv properly in getOperationCost(). @hfinkel might have better idea about it.
Fix default threshold.
Clearing it from there doesn't seem to be enough when assembling .s files from clang though.
Also with tests for each backend, this diff will get quite big. As this is opt-in, it might make sense to enable backends separately.
I like the idea. However for all backends, except Arm and AArch64, we would have to maintain another list of CPU names. At least for the targets which implement isValidCPUName, we could add an array with valid names and use that. That's still not ideal, but I do not think we should hold back this patch until all targets use the same infrastructure to handle CPUs. As for tests, we should test the note for all backends I think.
Thanks for the patch, I think this would be a nice UI improvement. I agree with Sam, please use CPUNames and AArch64CPUNames. Also there are unit tests for the target parser in unittests/Support/TargetParserTest.cpp. Could you add some for the new functions?
Tue, Feb 6
Mon, Feb 5
Update the patch to use getInstructionCost(i, TCK_CodeSize) to decide whether it is worth to perform call site splitting. Also avoid creating PHI nodes for existing PHI nodes unnecessarily.
Sun, Feb 4
Fri, Feb 2
Thu, Feb 1
Thanks for having a look! I've updated the patch to use a fixed size array for ValueToValueMapTys, used Preds.size() when creating the PHI node and updated the remaining tests.
Wed, Jan 31
Tue, Jan 30
Thanks Gerolf! The main reason for adding this is that @mssimpso pointed out in D41587 that some of the recently added patterns were not optimally ordered for at least Falkor and I wanted to give us a better chance at detecting sub-optimal orderings automatically.
Mon, Jan 29
Sun, Jan 28
I'll commit this in the next couple of days. @Gerolf do you have any additional thoughts/ideas?
Fri, Jan 26
Ah yes, the original patch went in just after the branch.
@junbuml I committed this tiny fix without prior code review. I missed that in the original patch, it should be fixed properly now.
Thu, Jan 25
I've put up a patch D42556
Jan 24 2018
Debug output with this patch for test/CodeGen/AArch64/machine-combiner.mir. I think it's an overall improvement , I just have a few more comments. I think it also would be worth explicitly stating that we applied a substitution.
LGTM with 2 small comments, I think all comments should be addressed now, thanks! Please wait with committing for another day or so, to give people some time to raise additional concerns.
Jan 23 2018
I've run spec2006, spec2000 and the test-suite on Cortex-A57. Slight improvement in the geomean speedup, no perf regressions, not size regressions.
Jan 22 2018
Update code to insert new PHIs at the beginning of TailBB and delete instructions up to the original first instruction. This way, remove unnecessary existing PHI nodes, while not deleting newly created ones.